Speech Synthesis API
Update:
Text To Speech
Product Overview
- Convert text into a natural and smooth real-time audio stream through LiveData AI technology. This interface is suitable for low-latency scenarios such as synthesis-while-playing.
Apply for services
Synthesis API uses a complete flow and self-application model. You may sign up on the LiveData official website (https://www.ilivedata.com/), and then create an application on the console. An appId and service key will be assigned to you.
You can also activate other services on the Management Console - Overview Page.
Access Method
Service Endpoint
https://tts.ilivedata.com/api/v1/speech/synthesis/stream
HTTP Request headers
Header
Value
Description
Content-Type
application/json;charset=UTF-8
Request body type
Accept
application/octet-stream
Response body type. It is recommended to explicitly specify binary audio streaming
X-AppId
Example: 81900001
Unique identifier of the project or application. It can be passed through the header; if omitted, appId must be provided in the request body
X-TimeStamp
Example: 2024-07-01T07:59:59Z
UTC timestamp of the request. The timestamp must follow the W3C format, for example: 2024-07-01T07:59:59Z. (http://www.w3.org/TR/xmlschema-2/#dateTime)
Authorization
Example: Njl86M/jY6zZaZoGhZdGO+GI/8+yGFECusGH1yQHUFE=
Signature Token
Request Method:POST
Request Body:
Field Name
Optional
Type
Description
text
Required
String
Text content. It must not be empty after trimming leading and trailing spaces. The default supported length range is [1,2000]
language
Optional
String
Text content language. It is recommended to pass this parameter. If omitted, the language will be automatically detected. For supported languages, see Language List
voice
Required
VoiceSetting
Synthetic voice related configuration. When the parameter is empty, the system default sound corresponding to the language is used
output
Required
OutputSetting
Output config
VoiceSetting:
Field Name
Optional
Type
Description
name
Required
String
Contact customer service to obtain available voice options
OutputSetting:
Field Name
Optional
Type
Description
format
Required
String
Output audio format, candidate is pcm, wav, mp3, opus, default is wav
Request Sample
{
"text": "Hello, welcome to LiveData.",
"language": "zh-CN",
"voice": {
"name": "juvenile"
},
"output": {
"format": "mp3"
}
}
Request sign
When the user requests the speech synthesis API, he or she can use the appId and secretKey to sign the request. When the API receives the request with signature information, it will use the same algorithm to verify the signature. If the signature is found to be inconsistent, the API will return 401 to the user.
If the API verification signatures are consistent and the user corresponding to the appId has permission to operate the requested resource, the request is successful, otherwise the API returns 401.
Send signature via HTTP request header
Method: Add a header named Authorization in the request, whose value is the signature value. For example:
Authorization: Njl86M/jY6zZaZoGhZdGO+GI/8+yGFECusGH1yQHUFE=
Signature calculation method
- Canonicalized Query String:
Convert the request body JSON string to a hexadecimal string (not Base64) by doing sha256 encoding with UTF-8 character encoding.
CanonicalizedQueryString = hex(sha256(jsonBody))
- Constructs the signed string
StringToSign ("\n" stands for ASCII newline character):
StringToSign = HTTPMethod + "\n" +
HostHeaderInLowercase + "\n" +
HTTPRequestURI + "\n" +
CanonicalizedQueryString <from the previous step> + "\n" +
"X-AppId:" + SAME_APPID_IN_HEADER + "\n" +
"X-TimeStamp:" + SAME_TIMESTAMP_IN_HEADER
The HTTPRequestURI is the absolute path to the request URI, without the request string. If the HTTPRequestURI is empty, also keep a forward slash (/).
The hash-based message authentication code (HMAC) is created using the HMAC-SHA256 protocol and the signature is calculated.
StringToSign as the signature string, secretKey as the secret key and SHA256 as the hash algorithm.
For more information about HMAC, see: https://tools.ietf.org/html/rfc2104.
-
Converting the results of the previous step to a BASE64 string
-
Put the BASE64 string into the Authorization of HTTP request Header
Example of signature
Below is an example of appId and secretKey.
appId=81900001
secrectKey=****
Below is an example of request body.
{"text":"Hello, welcome to LiveData.","language":"zh-CN","voice":{"name":"juvenile"},"output":{"format":"mp3"}}
Generate CanonicalizedQueryString
4625d9e60b1fce4a6b4a01fb3ba8e7f33cd2751587ce79e4dc29e6d38b1fb1e9
Generate StringToSign
POST
tts.ilivedata.com
/api/v1/speech/synthesis/stream
4625d9e60b1fce4a6b4a01fb3ba8e7f33cd2751587ce79e4dc29e6d38b1fb1e9
X-AppId:81900001
X-TimeStamp:2024-11-01T07:59:59Z
Signatures from HMAC calculations
1nNkKezG9XgkbCau9aENhDDRJhoTMHAI85NnjY+Mm4k=
HTTP Response
Content-Type: application/octet-stream
The result is returned as a binary audio stream instead of JSON. Please refer to the following headers and examples.
Success Response Headers
Header
Type
Description
Content-Type
String
application/octet-stream
Cache-Control
String
no-store
X-Task-Id
String
Unique task identifier
X-Audio-Format
String
Output audio format
Transfer-Encoding
String
chunked
Response Sample
Sample Response Headers
HTTP/1.1 200 OK
Content-Type: application/octet-stream
Cache-Control: no-store
X-Task-Id: bj_367a9314e0294cf1afcaee5c93b58129
X-Audio-Format: mp3
Transfer-Encoding: chunked
The response body is a binary audio stream, such as chunked mp3, wav, pcm, or opus data. The client should receive it in binary mode and then play or save it.
Error Response
For scenarios such as invalid request parameters, nonexistent applications, unsupported languages, or unsupported output formats, the interface returns a non-200 response. In failure cases, the response body is JSON.
Field Name
Type
Description
errorCode
Number
Error code
errorMessage
String
Error message
Error Response Example
Sample Error Response
{
"errorCode": 3003,
"errorMessage": "Invalid voice name."
}
Call Example
curl -N -X POST 'https://tts.ilivedata.com/api/v1/speech/synthesis/stream' \
-H 'Content-Type: application/json' \
-H 'Accept: application/octet-stream' \
-H 'X-AppId: 81900001' \
-d '{"text":"Hello, this is a streaming speech synthesis example.","language":"zh-CN","output":{"format":"mp3"}}' \
--output stream_audio.mp3 -D headers.txt
Text To Speech
Product Overview
- Convert text into a natural and smooth real-time audio stream through LiveData AI technology. This interface is suitable for low-latency scenarios such as synthesis-while-playing.
Apply for services
Synthesis API uses a complete flow and self-application model. You may sign up on the LiveData official website (https://www.ilivedata.com/), and then create an application on the console. An appId and service key will be assigned to you.
You can also activate other services on the Management Console - Overview Page.
Access Method
Service Endpoint
https://tts.ilivedata.com/api/v1/speech/synthesis/stream
HTTP Request headers
| Header | Value | Description |
|---|---|---|
| Content-Type | application/json;charset=UTF-8 | Request body type |
| Accept | application/octet-stream | Response body type. It is recommended to explicitly specify binary audio streaming |
| X-AppId | Example: 81900001 | Unique identifier of the project or application. It can be passed through the header; if omitted, appId must be provided in the request body |
| X-TimeStamp | Example: 2024-07-01T07:59:59Z | UTC timestamp of the request. The timestamp must follow the W3C format, for example: 2024-07-01T07:59:59Z. (http://www.w3.org/TR/xmlschema-2/#dateTime) |
| Authorization | Example: Njl86M/jY6zZaZoGhZdGO+GI/8+yGFECusGH1yQHUFE= | Signature Token |
Request Method:POST
Request Body:
| Field Name | Optional | Type | Description |
|---|---|---|---|
| text | Required | String | Text content. It must not be empty after trimming leading and trailing spaces. The default supported length range is [1,2000] |
| language | Optional | String | Text content language. It is recommended to pass this parameter. If omitted, the language will be automatically detected. For supported languages, see Language List |
| voice | Required | VoiceSetting | Synthetic voice related configuration. When the parameter is empty, the system default sound corresponding to the language is used |
| output | Required | OutputSetting | Output config |
VoiceSetting:
| Field Name | Optional | Type | Description |
|---|---|---|---|
| name | Required | String | Contact customer service to obtain available voice options |
OutputSetting:
| Field Name | Optional | Type | Description |
|---|---|---|---|
| format | Required | String | Output audio format, candidate is pcm, wav, mp3, opus, default is wav |
Request Sample
{
"text": "Hello, welcome to LiveData.",
"language": "zh-CN",
"voice": {
"name": "juvenile"
},
"output": {
"format": "mp3"
}
}
Request sign
When the user requests the speech synthesis API, he or she can use the appId and secretKey to sign the request. When the API receives the request with signature information, it will use the same algorithm to verify the signature. If the signature is found to be inconsistent, the API will return 401 to the user.
If the API verification signatures are consistent and the user corresponding to the appId has permission to operate the requested resource, the request is successful, otherwise the API returns 401.
Send signature via HTTP request header
Method: Add a header named Authorization in the request, whose value is the signature value. For example:
Authorization: Njl86M/jY6zZaZoGhZdGO+GI/8+yGFECusGH1yQHUFE=
Signature calculation method
- Canonicalized Query String:
Convert the request body JSON string to a hexadecimal string (not Base64) by doing sha256 encoding with UTF-8 character encoding.
CanonicalizedQueryString = hex(sha256(jsonBody))
- Constructs the signed string
StringToSign("\n"stands for ASCII newline character):
StringToSign = HTTPMethod + "\n" +
HostHeaderInLowercase + "\n" +
HTTPRequestURI + "\n" +
CanonicalizedQueryString <from the previous step> + "\n" +
"X-AppId:" + SAME_APPID_IN_HEADER + "\n" +
"X-TimeStamp:" + SAME_TIMESTAMP_IN_HEADER
The HTTPRequestURI is the absolute path to the request URI, without the request string. If the HTTPRequestURI is empty, also keep a forward slash (/).
The hash-based message authentication code (HMAC) is created using the HMAC-SHA256 protocol and the signature is calculated.
StringToSignas the signature string,secretKeyas the secret key and SHA256 as the hash algorithm.
For more information about HMAC, see: https://tools.ietf.org/html/rfc2104.
-
Converting the results of the previous step to a BASE64 string
-
Put the BASE64 string into the
Authorizationof HTTP request Header
Example of signature
Below is an example of appId and secretKey.
appId=81900001
secrectKey=****
Below is an example of request body.
{"text":"Hello, welcome to LiveData.","language":"zh-CN","voice":{"name":"juvenile"},"output":{"format":"mp3"}}
Generate CanonicalizedQueryString
4625d9e60b1fce4a6b4a01fb3ba8e7f33cd2751587ce79e4dc29e6d38b1fb1e9
Generate StringToSign
POST
tts.ilivedata.com
/api/v1/speech/synthesis/stream
4625d9e60b1fce4a6b4a01fb3ba8e7f33cd2751587ce79e4dc29e6d38b1fb1e9
X-AppId:81900001
X-TimeStamp:2024-11-01T07:59:59Z
Signatures from HMAC calculations
1nNkKezG9XgkbCau9aENhDDRJhoTMHAI85NnjY+Mm4k=
HTTP Response
Content-Type: application/octet-stream
The result is returned as a binary audio stream instead of JSON. Please refer to the following headers and examples.
Success Response Headers
| Header | Type | Description |
|---|---|---|
| Content-Type | String | application/octet-stream |
| Cache-Control | String | no-store |
| X-Task-Id | String | Unique task identifier |
| X-Audio-Format | String | Output audio format |
| Transfer-Encoding | String | chunked |
Response Sample
Sample Response Headers
HTTP/1.1 200 OK
Content-Type: application/octet-stream
Cache-Control: no-store
X-Task-Id: bj_367a9314e0294cf1afcaee5c93b58129
X-Audio-Format: mp3
Transfer-Encoding: chunked
The response body is a binary audio stream, such as chunked mp3, wav, pcm, or opus data. The client should receive it in binary mode and then play or save it.
Error Response
For scenarios such as invalid request parameters, nonexistent applications, unsupported languages, or unsupported output formats, the interface returns a non-200 response. In failure cases, the response body is JSON.
| Field Name | Type | Description |
|---|---|---|
| errorCode | Number | Error code |
| errorMessage | String | Error message |
Error Response Example
Sample Error Response
{
"errorCode": 3003,
"errorMessage": "Invalid voice name."
}
Call Example
curl -N -X POST 'https://tts.ilivedata.com/api/v1/speech/synthesis/stream' \
-H 'Content-Type: application/json' \
-H 'Accept: application/octet-stream' \
-H 'X-AppId: 81900001' \
-d '{"text":"Hello, this is a streaming speech synthesis example.","language":"zh-CN","output":{"format":"mp3"}}' \
--output stream_audio.mp3 -D headers.txt