Text To Speech

Product Overview

Convert text into a natural and smooth real-time audio stream through LiveData AI technology. This interface is suitable for low-latency scenarios such as synthesis-while-playing.

Apply for services

Synthesis API uses a complete flow and self-application model. You may sign up on the LiveData official website (https://www.ilivedata.com/), and then create an application on the console. An appId and service key will be assigned to you.

You can also activate other services on the Management Console - Overview Page.

Access Method

Service Endpoint

https://tts.ilivedata.com/api/v1/speech/synthesis/stream

HTTP Request headers

Header	Value	Description
Content-Type	application/json;charset=UTF-8	Request body type
Accept	application/octet-stream	Response body type. It is recommended to explicitly specify binary audio streaming
X-AppId	Example: 81900001	Unique identifier of the project or application. It can be passed through the header; if omitted, `appId` must be provided in the request body
X-TimeStamp	Example: 2024-07-01T07:59:59Z	UTC timestamp of the request. The timestamp must follow the W3C format, for example: 2024-07-01T07:59:59Z. (http://www.w3.org/TR/xmlschema-2/#dateTime)
Authorization	Example: Njl86M/jY6zZaZoGhZdGO+GI/8+yGFECusGH1yQHUFE=	Signature Token

Request Method：POST

Request Body:

Field Name	Optional	Type	Description
text	Required	String	Text content. It must not be empty after trimming leading and trailing spaces. The default supported length range is [1,2000]
language	Optional	String	Text content language. It is recommended to pass this parameter. If omitted, the language will be automatically detected. For supported languages, see Language List
voice	Required	VoiceSetting	Synthetic voice related configuration. When the parameter is empty, the system default sound corresponding to the language is used
output	Required	OutputSetting	Output config

VoiceSetting:

Field Name	Optional	Type	Description
name	Required	String	Contact customer service to obtain available voice options

OutputSetting:

Field Name	Optional	Type	Description
format	Required	String	Output audio format, candidate is `pcm`, `wav`, `mp3`, `opus`, default is `wav`

Request Sample

{
    "text": "Hello, welcome to LiveData.",
    "language": "zh-CN",
    "voice": {
        "name": "juvenile"
    },
    "output": {
        "format": "mp3"
    }
}

Request sign

When the user requests the speech synthesis API, he or she can use the appId and secretKey to sign the request. When the API receives the request with signature information, it will use the same algorithm to verify the signature. If the signature is found to be inconsistent, the API will return 401 to the user.

If the API verification signatures are consistent and the user corresponding to the appId has permission to operate the requested resource, the request is successful, otherwise the API returns 401.

Send signature via HTTP request header

Method: Add a header named Authorization in the request, whose value is the signature value. For example:

Authorization: Njl86M/jY6zZaZoGhZdGO+GI/8+yGFECusGH1yQHUFE=

Signature calculation method

Canonicalized Query String:

Convert the request body JSON string to a hexadecimal string (not Base64) by doing sha256 encoding with UTF-8 character encoding.

CanonicalizedQueryString = hex(sha256(jsonBody))

Constructs the signed string StringToSign ("\n" stands for ASCII newline character):

StringToSign = HTTPMethod + "\n" +
               HostHeaderInLowercase + "\n" +
               HTTPRequestURI + "\n" +
               CanonicalizedQueryString <from the previous step> + "\n" +
               "X-AppId:" + SAME_APPID_IN_HEADER + "\n" +
               "X-TimeStamp:" + SAME_TIMESTAMP_IN_HEADER

The HTTPRequestURI is the absolute path to the request URI, without the request string. If the HTTPRequestURI is empty, also keep a forward slash (/).

The hash-based message authentication code (HMAC) is created using the HMAC-SHA256 protocol and the signature is calculated.

StringToSign as the signature string, secretKey as the secret key and SHA256 as the hash algorithm.

For more information about HMAC, see: https://tools.ietf.org/html/rfc2104.

Converting the results of the previous step to a BASE64 string
Put the BASE64 string into the Authorization of HTTP request Header

Example of signature

Below is an example of appId and secretKey.

appId=81900001
secrectKey=****

Below is an example of request body.

{"text":"Hello, welcome to LiveData.","language":"zh-CN","voice":{"name":"juvenile"},"output":{"format":"mp3"}}

Generate CanonicalizedQueryString

4625d9e60b1fce4a6b4a01fb3ba8e7f33cd2751587ce79e4dc29e6d38b1fb1e9

Generate StringToSign

POST
tts.ilivedata.com
/api/v1/speech/synthesis/stream
4625d9e60b1fce4a6b4a01fb3ba8e7f33cd2751587ce79e4dc29e6d38b1fb1e9
X-AppId:81900001
X-TimeStamp:2024-11-01T07:59:59Z

Signatures from HMAC calculations

1nNkKezG9XgkbCau9aENhDDRJhoTMHAI85NnjY+Mm4k=

HTTP Response

Content-Type: application/octet-stream

The result is returned as a binary audio stream instead of JSON. Please refer to the following headers and examples.

Success Response Headers

Header	Type	Description
Content-Type	String	application/octet-stream
Cache-Control	String	no-store
X-Task-Id	String	Unique task identifier
X-Audio-Format	String	Output audio format
Transfer-Encoding	String	chunked

Response Sample

Sample Response Headers

HTTP/1.1 200 OK
Content-Type: application/octet-stream
Cache-Control: no-store
X-Task-Id: bj_367a9314e0294cf1afcaee5c93b58129
X-Audio-Format: mp3
Transfer-Encoding: chunked

The response body is a binary audio stream, such as chunked mp3, wav, pcm, or opus data. The client should receive it in binary mode and then play or save it.

Error Response

For scenarios such as invalid request parameters, nonexistent applications, unsupported languages, or unsupported output formats, the interface returns a non-200 response. In failure cases, the response body is JSON.

Field Name	Type	Description
errorCode	Number	Error code
errorMessage	String	Error message

Error Response Example

Sample Error Response

{
    "errorCode": 3003,
    "errorMessage": "Invalid voice name."
}

Call Example

curl -N -X POST 'https://tts.ilivedata.com/api/v1/speech/synthesis/stream' \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/octet-stream' \
  -H 'X-AppId: 81900001' \
  -d '{"text":"Hello, this is a streaming speech synthesis example.","language":"zh-CN","output":{"format":"mp3"}}' \
  --output stream_audio.mp3 -D headers.txt

Speech Synthesis API