WebSocket Streaming Speech Synthesis
Update:
WebSocket Streaming Speech Synthesis
Product Overview
- This API performs streaming speech synthesis over a WebSocket connection. The client sends JSON text frames, and the server returns task initialization, audio chunk, task completion, or error events.
- It is suitable for real-time scenarios where multiple synthesis requests are sent over the same connection and response events are grouped by task ID.
Apply for services
Synthesis API uses a complete flow and self-application model. You may sign up on the LiveData official website (https://www.ilivedata.com/), and then create an application on the console. An appId and service key will be assigned to you.
You can also activate other services on the Management Console - Overview Page.
Integration Flow
- Call the token issuing API and complete auth verification with
appId and secretKey to obtain a WebSocket Token.
- Append the
token query parameter to the returned wsUrl and establish the WebSocket connection.
- After the connection is established, send synthesis request JSON through WebSocket.
- The server returns
init, audio, done, and error events through WebSocket.
Get WebSocket Token
Service Endpoint
https://tts.ilivedata.com/api/v1/speech/synthesis/ws-token
HTTP Request Headers
Header
Value
Description
X-AppId
Example: 81900001
Unique identifier of the project or application
X-TimeStamp
Example: 2024-07-01T07:59:59Z
UTC timestamp of the request. The timestamp must follow the W3C format
Authorization
Example: Njl86M/jY6zZaZoGhZdGO+GI/8+yGFECusGH1yQHUFE=
Signature token
Request Method: GET
Request Signature
When requesting the token issuing API, use appId and secretKey to sign the request. The API verifies the signature with the same algorithm. If the signature is invalid, authentication fails.
Signature Calculation
- Construct
StringToSign ("\n" stands for ASCII newline character):
StringToSign = HTTPMethod + "\n" +
HostHeaderInLowercase + "\n" +
HTTPRequestURI + "\n" +
"X-AppId:" + SAME_APPID_IN_HEADER + "\n" +
"X-TimeStamp:" + SAME_TIMESTAMP_IN_HEADER
The token issuing API uses the GET method and usually has no request body. HTTPRequestURI is the absolute path of the request URI without the query string.
-
Use StringToSign as the signed string, secretKey as the secret key, and HMAC-SHA256 as the hash algorithm.
-
Convert the result to a Base64 string.
-
Put the Base64 string into the Authorization HTTP request header.
Signature Example
GET
tts.ilivedata.com
/api/v1/speech/synthesis/ws-token
X-AppId:81900001
X-TimeStamp:2024-11-01T07:59:59Z
Request Sample
curl -X GET 'https://tts.ilivedata.com/api/v1/speech/synthesis/ws-token' \
-H 'X-AppId: 81900001' \
-H 'X-TimeStamp: 2024-11-01T07:59:59Z' \
-H 'Authorization: {signature}'
Response Sample
{
"token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
"expiresIn": 60,
"expiresAt": 1782359144,
"wsUrl": "wss://tts.ilivedata.com/api/v1/speech/synthesis/ws"
}
Response Fields
Field Name
Type
Description
token
String
WebSocket JWT signed with RS256
expiresIn
Number
Token lifetime in seconds. The token must be used to establish the WebSocket connection before it expires
expiresAt
Number
Token expiration time in Unix seconds
wsUrl
String
TTS WebSocket connection URL without the token parameter
Establish WebSocket Connection
WebSocket URL
wss://tts.ilivedata.com/api/v1/speech/synthesis/ws?token={token}
The token is used only for WebSocket handshake authentication. After the connection is established, the current connection will not be closed automatically when the token expires. If the connection is closed and the client needs to reconnect, obtain a new token.
TTS verifies the JWT signature, expiration time, aud, iss, scope, path, and appId. The appId in the request message must match the appId in the token.
WebSocket Request Parameters
After the connection is established, the client sends request JSON through WebSocket text messages.
Top-level Parameters
Field Name
Optional
Type
Description
appId
Conditionally required
Number
Required if request.appId is omitted. The top-level appId has priority
sessionId
Optional
String
Business session ID. If omitted, the same WebSocket connection reuses a connection-level default sessionId
request
Required
Object
Synthesis request body, same structure as SynthesisRequest
request
Field Name
Optional
Type
Description
appId
Conditionally required
Number
Required if the top-level appId is omitted
text
Required
String
Text to synthesize. It must not be empty after trimming leading and trailing spaces
language
Optional
String
Text content language. It is recommended to pass this parameter. If omitted, the language will be automatically detected. For supported languages, see Language List
voice
Optional
VoiceSetting
Synthetic voice related configuration
output
Optional
OutputSetting
Output audio related configuration
VoiceSetting
Field Name
Optional
Type
Description
name
Optional
String
Voice name from Prebuilt Voices or Voice Registration
audio
Optional
String
Audio file used for voice cloning when the voice name is not specified
emotion
Optional
String
Emotional expression
OutputSetting
Field Name
Optional
Type
Description
format
Optional
String
Output audio format. Candidates are pcm, wav, mp3, and opus. The default value is wav
Request Sample
{
"appId": 81900001,
"request": {
"appId": 81900001,
"text": "Hello, this is a WebSocket streaming speech synthesis example.",
"language": "en",
"voice": {
"name": "juvenile"
},
"output": {
"format": "mp3"
}
}
}
If the client needs to specify a business session, pass sessionId explicitly:
{
"appId": 81900001,
"sessionId": "biz-session-001",
"request": {
"appId": 81900001,
"text": "The first message in the same business session.",
"voice": {
"name": "juvenile"
},
"output": {
"format": "mp3"
}
}
}
WebSocket Response Events
init Event
Indicates that the server has accepted the task and returned task identifiers.
Field Name
Type
Description
event
String
Fixed value: init
taskId
String
Task ID generated by the server
sessionId
String
Session ID. If omitted by the client, the server generates a connection-level session ID
status
String
Fixed value: init
taskStatus
Number
Task status
{
"event": "init",
"taskId": "bj_ws_1b3d21549d3841d3b4400829403a4fff",
"sessionId": "bj_ws_5f45c8fa85814b159741c80620f705bf",
"status": "init",
"taskStatus": 1
}
audio Event
Indicates an audio chunk.
Field Name
Type
Description
event
String
Fixed value: audio
taskId
String
Task ID
sessionId
String
Session ID
seq
Number
Audio chunk sequence number
itemIndex
Number
Text chunk index
itemDone
Boolean
Whether the current itemIndex is completed
sampleRate
Number
Sample rate of the current audio chunk
durationMs
Number
Duration of the current audio chunk in milliseconds
audioBase64
String
Base64 string of the audio binary chunk
status
String
Fixed value: streaming
{
"event": "audio",
"taskId": "bj_ws_1b3d21549d3841d3b4400829403a4fff",
"sessionId": "bj_ws_5f45c8fa85814b159741c80620f705bf",
"seq": 12,
"itemIndex": 0,
"itemDone": false,
"sampleRate": 22050,
"durationMs": 120,
"audioBase64": "...",
"status": "streaming"
}
done Event
Indicates that the task is completed and the full audio file has been uploaded.
Field Name
Type
Description
event
String
Fixed value: done
taskId
String
Task ID
sessionId
String
Session ID
status
String
Fixed value: done
url
String
URL of the uploaded audio file
{
"event": "done",
"taskId": "bj_ws_1b3d21549d3841d3b4400829403a4fff",
"sessionId": "bj_ws_5f45c8fa85814b159741c80620f705bf",
"status": "done",
"url": "https://xxx.cos.accelerate.myqcloud.com/tts/.../bj_ws_1b3d21549d3841d3b4400829403a4fff.mp3"
}
error Event
Indicates that the task failed.
Field Name
Type
Description
event
String
Fixed value: error
taskId
String
Task ID. It may be empty if the request has not entered the task initialization phase
sessionId
String
Session ID. It may be empty if the request has not entered the task initialization phase
status
String
Fixed value: error
errorCode
Number
Error code
errorMessage
String
Error message
{
"event": "error",
"taskId": "bj_ws_1b3d21549d3841d3b4400829403a4fff",
"sessionId": "bj_ws_5f45c8fa85814b159741c80620f705bf",
"status": "error",
"errorCode": 3003,
"errorMessage": "Invalid voice name."
}
taskId and sessionId Rules
- The client does not need to pass
taskId. The server generates a unique taskId for each message and returns it in init, audio, done, and error events.
- If the client passes
taskId, the server ignores it and still uses the server-generated taskId.
- If the client does not pass
sessionId, the server generates a connection-level default sessionId when the WebSocket connection is established. Multiple requests on the same connection return the same sessionId.
- If the client explicitly passes
sessionId, the server uses the client-provided sessionId.
Disconnection Handling
- After the WebSocket connection is closed, the client needs to establish a new connection and resend the synthesis request.
- The server does not use the client-provided
taskId for task-level recovery.
- If the business needs to associate requests before and after reconnection, the client may pass the same
sessionId as the business session identifier.
Client Handling Recommendations
- The token has an expiration time. It is recommended to establish the WebSocket connection immediately after obtaining the token.
- After receiving
init, record the returned taskId and sessionId for log troubleshooting and event grouping.
- Decode
audioBase64 before playing or caching audio chunks.
- Use the
done event as the final source of the uploaded audio file URL.
- If the connection is closed, establish a new WebSocket connection and resend the request. The new request will generate a new
taskId.
WebSocket Streaming Speech Synthesis
Product Overview
- This API performs streaming speech synthesis over a WebSocket connection. The client sends JSON text frames, and the server returns task initialization, audio chunk, task completion, or error events.
- It is suitable for real-time scenarios where multiple synthesis requests are sent over the same connection and response events are grouped by task ID.
Apply for services
Synthesis API uses a complete flow and self-application model. You may sign up on the LiveData official website (https://www.ilivedata.com/), and then create an application on the console. An appId and service key will be assigned to you.
You can also activate other services on the Management Console - Overview Page.
Integration Flow
- Call the token issuing API and complete auth verification with
appIdandsecretKeyto obtain a WebSocket Token. - Append the
tokenquery parameter to the returnedwsUrland establish the WebSocket connection. - After the connection is established, send synthesis request JSON through WebSocket.
- The server returns
init,audio,done, anderrorevents through WebSocket.
Get WebSocket Token
Service Endpoint
https://tts.ilivedata.com/api/v1/speech/synthesis/ws-token
HTTP Request Headers
| Header | Value | Description |
|---|---|---|
| X-AppId | Example: 81900001 | Unique identifier of the project or application |
| X-TimeStamp | Example: 2024-07-01T07:59:59Z | UTC timestamp of the request. The timestamp must follow the W3C format |
| Authorization | Example: Njl86M/jY6zZaZoGhZdGO+GI/8+yGFECusGH1yQHUFE= | Signature token |
Request Method: GET
Request Signature
When requesting the token issuing API, use appId and secretKey to sign the request. The API verifies the signature with the same algorithm. If the signature is invalid, authentication fails.
Signature Calculation
- Construct
StringToSign("\n"stands for ASCII newline character):
StringToSign = HTTPMethod + "\n" +
HostHeaderInLowercase + "\n" +
HTTPRequestURI + "\n" +
"X-AppId:" + SAME_APPID_IN_HEADER + "\n" +
"X-TimeStamp:" + SAME_TIMESTAMP_IN_HEADER
The token issuing API uses the GET method and usually has no request body. HTTPRequestURI is the absolute path of the request URI without the query string.
-
Use
StringToSignas the signed string,secretKeyas the secret key, and HMAC-SHA256 as the hash algorithm. -
Convert the result to a Base64 string.
-
Put the Base64 string into the
AuthorizationHTTP request header.
Signature Example
GET
tts.ilivedata.com
/api/v1/speech/synthesis/ws-token
X-AppId:81900001
X-TimeStamp:2024-11-01T07:59:59Z
Request Sample
curl -X GET 'https://tts.ilivedata.com/api/v1/speech/synthesis/ws-token' \
-H 'X-AppId: 81900001' \
-H 'X-TimeStamp: 2024-11-01T07:59:59Z' \
-H 'Authorization: {signature}'
Response Sample
{
"token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
"expiresIn": 60,
"expiresAt": 1782359144,
"wsUrl": "wss://tts.ilivedata.com/api/v1/speech/synthesis/ws"
}
Response Fields
| Field Name | Type | Description |
|---|---|---|
| token | String | WebSocket JWT signed with RS256 |
| expiresIn | Number | Token lifetime in seconds. The token must be used to establish the WebSocket connection before it expires |
| expiresAt | Number | Token expiration time in Unix seconds |
| wsUrl | String | TTS WebSocket connection URL without the token parameter |
Establish WebSocket Connection
WebSocket URL
wss://tts.ilivedata.com/api/v1/speech/synthesis/ws?token={token}
The token is used only for WebSocket handshake authentication. After the connection is established, the current connection will not be closed automatically when the token expires. If the connection is closed and the client needs to reconnect, obtain a new token.
TTS verifies the JWT signature, expiration time, aud, iss, scope, path, and appId. The appId in the request message must match the appId in the token.
WebSocket Request Parameters
After the connection is established, the client sends request JSON through WebSocket text messages.
Top-level Parameters
| Field Name | Optional | Type | Description |
|---|---|---|---|
| appId | Conditionally required | Number | Required if request.appId is omitted. The top-level appId has priority |
| sessionId | Optional | String | Business session ID. If omitted, the same WebSocket connection reuses a connection-level default sessionId |
| request | Required | Object | Synthesis request body, same structure as SynthesisRequest |
request
| Field Name | Optional | Type | Description |
|---|---|---|---|
| appId | Conditionally required | Number | Required if the top-level appId is omitted |
| text | Required | String | Text to synthesize. It must not be empty after trimming leading and trailing spaces |
| language | Optional | String | Text content language. It is recommended to pass this parameter. If omitted, the language will be automatically detected. For supported languages, see Language List |
| voice | Optional | VoiceSetting | Synthetic voice related configuration |
| output | Optional | OutputSetting | Output audio related configuration |
VoiceSetting
| Field Name | Optional | Type | Description |
|---|---|---|---|
| name | Optional | String | Voice name from Prebuilt Voices or Voice Registration |
| audio | Optional | String | Audio file used for voice cloning when the voice name is not specified |
| emotion | Optional | String | Emotional expression |
OutputSetting
| Field Name | Optional | Type | Description |
|---|---|---|---|
| format | Optional | String | Output audio format. Candidates are pcm, wav, mp3, and opus. The default value is wav |
Request Sample
{
"appId": 81900001,
"request": {
"appId": 81900001,
"text": "Hello, this is a WebSocket streaming speech synthesis example.",
"language": "en",
"voice": {
"name": "juvenile"
},
"output": {
"format": "mp3"
}
}
}
If the client needs to specify a business session, pass sessionId explicitly:
{
"appId": 81900001,
"sessionId": "biz-session-001",
"request": {
"appId": 81900001,
"text": "The first message in the same business session.",
"voice": {
"name": "juvenile"
},
"output": {
"format": "mp3"
}
}
}
WebSocket Response Events
init Event
Indicates that the server has accepted the task and returned task identifiers.
| Field Name | Type | Description |
|---|---|---|
| event | String | Fixed value: init |
| taskId | String | Task ID generated by the server |
| sessionId | String | Session ID. If omitted by the client, the server generates a connection-level session ID |
| status | String | Fixed value: init |
| taskStatus | Number | Task status |
{
"event": "init",
"taskId": "bj_ws_1b3d21549d3841d3b4400829403a4fff",
"sessionId": "bj_ws_5f45c8fa85814b159741c80620f705bf",
"status": "init",
"taskStatus": 1
}
audio Event
Indicates an audio chunk.
| Field Name | Type | Description |
|---|---|---|
| event | String | Fixed value: audio |
| taskId | String | Task ID |
| sessionId | String | Session ID |
| seq | Number | Audio chunk sequence number |
| itemIndex | Number | Text chunk index |
| itemDone | Boolean | Whether the current itemIndex is completed |
| sampleRate | Number | Sample rate of the current audio chunk |
| durationMs | Number | Duration of the current audio chunk in milliseconds |
| audioBase64 | String | Base64 string of the audio binary chunk |
| status | String | Fixed value: streaming |
{
"event": "audio",
"taskId": "bj_ws_1b3d21549d3841d3b4400829403a4fff",
"sessionId": "bj_ws_5f45c8fa85814b159741c80620f705bf",
"seq": 12,
"itemIndex": 0,
"itemDone": false,
"sampleRate": 22050,
"durationMs": 120,
"audioBase64": "...",
"status": "streaming"
}
done Event
Indicates that the task is completed and the full audio file has been uploaded.
| Field Name | Type | Description |
|---|---|---|
| event | String | Fixed value: done |
| taskId | String | Task ID |
| sessionId | String | Session ID |
| status | String | Fixed value: done |
| url | String | URL of the uploaded audio file |
{
"event": "done",
"taskId": "bj_ws_1b3d21549d3841d3b4400829403a4fff",
"sessionId": "bj_ws_5f45c8fa85814b159741c80620f705bf",
"status": "done",
"url": "https://xxx.cos.accelerate.myqcloud.com/tts/.../bj_ws_1b3d21549d3841d3b4400829403a4fff.mp3"
}
error Event
Indicates that the task failed.
| Field Name | Type | Description |
|---|---|---|
| event | String | Fixed value: error |
| taskId | String | Task ID. It may be empty if the request has not entered the task initialization phase |
| sessionId | String | Session ID. It may be empty if the request has not entered the task initialization phase |
| status | String | Fixed value: error |
| errorCode | Number | Error code |
| errorMessage | String | Error message |
{
"event": "error",
"taskId": "bj_ws_1b3d21549d3841d3b4400829403a4fff",
"sessionId": "bj_ws_5f45c8fa85814b159741c80620f705bf",
"status": "error",
"errorCode": 3003,
"errorMessage": "Invalid voice name."
}
taskId and sessionId Rules
- The client does not need to pass
taskId. The server generates a uniquetaskIdfor each message and returns it ininit,audio,done, anderrorevents. - If the client passes
taskId, the server ignores it and still uses the server-generatedtaskId. - If the client does not pass
sessionId, the server generates a connection-level defaultsessionIdwhen the WebSocket connection is established. Multiple requests on the same connection return the samesessionId. - If the client explicitly passes
sessionId, the server uses the client-providedsessionId.
Disconnection Handling
- After the WebSocket connection is closed, the client needs to establish a new connection and resend the synthesis request.
- The server does not use the client-provided
taskIdfor task-level recovery. - If the business needs to associate requests before and after reconnection, the client may pass the same
sessionIdas the business session identifier.
Client Handling Recommendations
- The token has an expiration time. It is recommended to establish the WebSocket connection immediately after obtaining the token.
- After receiving
init, record the returnedtaskIdandsessionIdfor log troubleshooting and event grouping. - Decode
audioBase64before playing or caching audio chunks. - Use the
doneevent as the final source of the uploaded audio file URL. - If the connection is closed, establish a new WebSocket connection and resend the request. The new request will generate a new
taskId.