WebSocket Streaming Speech Synthesis

Product Overview

This API performs streaming speech synthesis over a WebSocket connection. The client sends JSON text frames, and the server returns task initialization, audio chunk, task completion, or error events.
It is suitable for real-time scenarios where multiple synthesis requests are sent over the same connection and response events are grouped by task ID.

Apply for services

Synthesis API uses a complete flow and self-application model. You may sign up on the LiveData official website (https://www.ilivedata.com/), and then create an application on the console. An appId and service key will be assigned to you.

You can also activate other services on the Management Console - Overview Page.

Integration Flow

Call the token issuing API and complete auth verification with appId and secretKey to obtain a WebSocket Token.
Append the token query parameter to the returned wsUrl and establish the WebSocket connection.
After the connection is established, send synthesis request JSON through WebSocket.
The server returns init, audio, done, and error events through WebSocket.

Get WebSocket Token

Service Endpoint

https://tts.ilivedata.com/api/v2/speech/synthesis/ws-token

HTTP Request Headers

Header	Value	Description
X-AppId	Example: 81900001	Unique identifier of the project or application
X-TimeStamp	Example: 2024-07-01T07:59:59Z	UTC timestamp of the request. The timestamp must follow the W3C format
Authorization	Example: Njl86M/jY6zZaZoGhZdGO+GI/8+yGFECusGH1yQHUFE=	Signature token

Request Method: GET

Request Signature

When requesting the token issuing API, use appId and secretKey to sign the request. The API verifies the signature with the same algorithm. If the signature is invalid, authentication fails.

Signature Calculation

Construct StringToSign ("\n" stands for ASCII newline character):

StringToSign = HTTPMethod + "\n" +
               HostHeaderInLowercase + "\n" +
               HTTPRequestURI + "\n" +
               "X-AppId:" + SAME_APPID_IN_HEADER + "\n" +
               "X-TimeStamp:" + SAME_TIMESTAMP_IN_HEADER

The token issuing API uses the GET method and usually has no request body. HTTPRequestURI is the absolute path of the request URI without the query string.

Use StringToSign as the signed string, secretKey as the secret key, and HMAC-SHA256 as the hash algorithm.
Convert the result to a Base64 string.
Put the Base64 string into the Authorization HTTP request header.

Signature Example

GET
tts.ilivedata.com
/api/v1/speech/synthesis/ws-token
X-AppId:81900001
X-TimeStamp:2024-11-01T07:59:59Z

Request Sample

curl -X GET 'https://tts.ilivedata.com/api/v1/speech/synthesis/ws-token' \
  -H 'X-AppId: 81900001' \
  -H 'X-TimeStamp: 2024-11-01T07:59:59Z' \
  -H 'Authorization: {signature}'

Response Sample

{
  "token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
  "expiresIn": 60,
  "expiresAt": 1782359144,
  "wsUrl": "wss://tts.ilivedata.com/api/v1/speech/synthesis/ws"
}

Response Fields

Field Name	Type	Description
token	String	WebSocket JWT signed with RS256
expiresIn	Number	Token lifetime in seconds. The token must be used to establish the WebSocket connection before it expires
expiresAt	Number	Token expiration time in Unix seconds
wsUrl	String	TTS WebSocket connection URL without the token parameter

Establish WebSocket Connection

WebSocket URL

wss://tts.ilivedata.com/api/v1/speech/synthesis/ws?token={token}

The token is used only for WebSocket handshake authentication. After the connection is established, the current connection will not be closed automatically when the token expires. If the connection is closed and the client needs to reconnect, obtain a new token.

TTS verifies the JWT signature, expiration time, aud, iss, scope, path, and appId. The appId in the request message must match the appId in the token.

WebSocket Request Parameters

After the connection is established, the client sends request JSON through WebSocket text messages.

Top-level Parameters

Field Name	Optional	Type	Description
appId	Conditionally required	Number	Required if `request.appId` is omitted. The top-level `appId` has priority
sessionId	Optional	String	Business session ID. If omitted, the same WebSocket connection reuses a connection-level default `sessionId`
request	Required	Object	Synthesis request body, same structure as `SynthesisRequest`

request

Field Name	Optional	Type	Description
appId	Conditionally required	Number	Required if the top-level `appId` is omitted
text	Required	String	Text to synthesize. It must not be empty after trimming leading and trailing spaces
language	Optional	String	Text content language. It is recommended to pass this parameter. If omitted, the language will be automatically detected. For supported languages, see Language List
voice	Optional	VoiceSetting	Synthetic voice related configuration
output	Optional	OutputSetting	Output audio related configuration

VoiceSetting

Field Name	Optional	Type	Description
name	Optional	String	Voice name from Prebuilt Voices or Voice Registration
audio	Optional	String	Audio file used for voice cloning when the voice name is not specified
emotion	Optional	String	Emotional expression

OutputSetting

Field Name	Optional	Type	Description
format	Optional	String	Output audio format. Candidates are `pcm`, `wav`, `mp3`, and `opus`. The default value is `wav`
loudnessLufs	Optional	Number	Target model loudness in LUFS. Valid range: -30.0 to -6.0. Omit it to disable loudness processing
speed	Optional	Number	Model speech/playback speed ratio. `<=0` or `1.0` means unchanged; otherwise valid range is 0.5 to 2.0

Request Sample

{
  "appId": 81900001,
  "request": {
    "appId": 81900001,
    "text": "Hello, this is a WebSocket streaming speech synthesis example.",
    "language": "en",
    "voice": {
      "name": "juvenile"
    },
    "output": {
      "format": "mp3",
      "loudnessLufs": -18,
      "speed": 1.25
    }
  }
}

If the client needs to specify a business session, pass sessionId explicitly:

{
  "appId": 81900001,
  "sessionId": "biz-session-001",
  "request": {
    "appId": 81900001,
    "text": "The first message in the same business session.",
    "voice": {
      "name": "juvenile"
    },
    "output": {
      "format": "mp3",
      "loudnessLufs": -18,
      "speed": 1.25
    }
  }
}

WebSocket Response Events

init Event

Indicates that the server has accepted the task and returned task identifiers.

Field Name	Type	Description
event	String	Fixed value: `init`
taskId	String	Task ID generated by the server
sessionId	String	Session ID. If omitted by the client, the server generates a connection-level session ID
status	String	Fixed value: `init`
taskStatus	Number	Task status

{
  "event": "init",
  "taskId": "bj_ws_1b3d21549d3841d3b4400829403a4fff",
  "sessionId": "bj_ws_5f45c8fa85814b159741c80620f705bf",
  "status": "init",
  "taskStatus": 1
}

audio Event

Indicates an audio chunk.

Field Name	Type	Description
event	String	Fixed value: `audio`
taskId	String	Task ID
sessionId	String	Session ID
seq	Number	Audio chunk sequence number
itemIndex	Number	Text chunk index
itemDone	Boolean	Whether the current `itemIndex` is completed
sampleRate	Number	Sample rate of the current audio chunk
durationMs	Number	Duration of the current audio chunk in milliseconds
audioBase64	String	Base64 string of the audio binary chunk
status	String	Fixed value: `streaming`

{
  "event": "audio",
  "taskId": "bj_ws_1b3d21549d3841d3b4400829403a4fff",
  "sessionId": "bj_ws_5f45c8fa85814b159741c80620f705bf",
  "seq": 12,
  "itemIndex": 0,
  "itemDone": false,
  "sampleRate": 22050,
  "durationMs": 120,
  "audioBase64": "...",
  "status": "streaming"
}

done Event

Indicates that the task is completed and the full audio file has been uploaded.

Field Name	Type	Description
event	String	Fixed value: `done`
taskId	String	Task ID
sessionId	String	Session ID
status	String	Fixed value: `done`
url	String	URL of the uploaded audio file

{
  "event": "done",
  "taskId": "bj_ws_1b3d21549d3841d3b4400829403a4fff",
  "sessionId": "bj_ws_5f45c8fa85814b159741c80620f705bf",
  "status": "done",
  "url": "https://xxx.cos.accelerate.myqcloud.com/tts/.../bj_ws_1b3d21549d3841d3b4400829403a4fff.mp3"
}

error Event

Indicates that the task failed.

Field Name	Type	Description
event	String	Fixed value: `error`
taskId	String	Task ID. It may be empty if the request has not entered the task initialization phase
sessionId	String	Session ID. It may be empty if the request has not entered the task initialization phase
status	String	Fixed value: `error`
errorCode	Number	Error code
errorMessage	String	Error message

{
  "event": "error",
  "taskId": "bj_ws_1b3d21549d3841d3b4400829403a4fff",
  "sessionId": "bj_ws_5f45c8fa85814b159741c80620f705bf",
  "status": "error",
  "errorCode": 3003,
  "errorMessage": "Invalid voice name."
}

taskId and sessionId Rules

The client does not need to pass taskId. The server generates a unique taskId for each message and returns it in init, audio, done, and error events.
If the client passes taskId, the server ignores it and still uses the server-generated taskId.
If the client does not pass sessionId, the server generates a connection-level default sessionId when the WebSocket connection is established. Multiple requests on the same connection return the same sessionId.
If the client explicitly passes sessionId, the server uses the client-provided sessionId.

Disconnection Handling

After the WebSocket connection is closed, the client needs to establish a new connection and resend the synthesis request.
The server does not use the client-provided taskId for task-level recovery.
If the business needs to associate requests before and after reconnection, the client may pass the same sessionId as the business session identifier.

Client Handling Recommendations

The token has an expiration time. It is recommended to establish the WebSocket connection immediately after obtaining the token.
After receiving init, record the returned taskId and sessionId for log troubleshooting and event grouping.
Decode audioBase64 before playing or caching audio chunks.
Use the done event as the final source of the uploaded audio file URL.
If the connection is closed, establish a new WebSocket connection and resend the request. The new request will generate a new taskId.