Speech Synthesis API

Text To Speech

Product Overview

  • Convert text into a natural and smooth real-time audio stream through LiveData AI technology. This interface is suitable for low-latency scenarios such as synthesis-while-playing.

Apply for services

Synthesis API uses a complete flow and self-application model. You may sign up on the LiveData official website (https://www.ilivedata.com/), and then create an application on the console. An appId and service key will be assigned to you.

You can also activate other services on the Management Console - Overview Page.

Access Method

Service Endpoint

https://tts.ilivedata.com/api/v1/speech/synthesis/stream

HTTP Request headers

Header Value Description
Content-Type application/json;charset=UTF-8 Request body type
Accept application/octet-stream Response body type. It is recommended to explicitly specify binary audio streaming
X-AppId Example: 81900001 Unique identifier of the project or application. It can be passed through the header; if omitted, appId must be provided in the request body
X-TimeStamp Example: 2024-07-01T07:59:59Z UTC timestamp of the request. The timestamp must follow the W3C format, for example: 2024-07-01T07:59:59Z. (http://www.w3.org/TR/xmlschema-2/#dateTime)
Authorization Example: Njl86M/jY6zZaZoGhZdGO+GI/8+yGFECusGH1yQHUFE= Signature Token

Request Method:POST

Request Body:

Field Name Optional Type Description
text Required String Text content. It must not be empty after trimming leading and trailing spaces. The default supported length range is [1,2000]
language Optional String Text content language. It is recommended to pass this parameter. If omitted, the language will be automatically detected. For supported languages, see Language List
voice Required VoiceSetting Synthetic voice related configuration. When the parameter is empty, the system default sound corresponding to the language is used
output Required OutputSetting Output config

VoiceSetting:

Field Name Optional Type Description
name Required String Contact customer service to obtain available voice options

OutputSetting:

Field Name Optional Type Description
format Required String Output audio format, candidate is pcm, wav, mp3, opus, default is wav

Request Sample

{
    "text": "Hello, welcome to LiveData.",
    "language": "zh-CN",
    "voice": {
        "name": "juvenile"
    },
    "output": {
        "format": "mp3"
    }
}

Request sign

When the user requests the speech synthesis API, he or she can use the appId and secretKey to sign the request. When the API receives the request with signature information, it will use the same algorithm to verify the signature. If the signature is found to be inconsistent, the API will return 401 to the user.

If the API verification signatures are consistent and the user corresponding to the appId has permission to operate the requested resource, the request is successful, otherwise the API returns 401.

Send signature via HTTP request header

Method: Add a header named Authorization in the request, whose value is the signature value. For example:

Authorization: Njl86M/jY6zZaZoGhZdGO+GI/8+yGFECusGH1yQHUFE=

Signature calculation method

  1. Canonicalized Query String:

Convert the request body JSON string to a hexadecimal string (not Base64) by doing sha256 encoding with UTF-8 character encoding.

CanonicalizedQueryString = hex(sha256(jsonBody))

  1. Constructs the signed string StringToSign ("\n" stands for ASCII newline character):
StringToSign = HTTPMethod + "\n" +
               HostHeaderInLowercase + "\n" +
               HTTPRequestURI + "\n" +
               CanonicalizedQueryString <from the previous step> + "\n" +
               "X-AppId:" + SAME_APPID_IN_HEADER + "\n" +
               "X-TimeStamp:" + SAME_TIMESTAMP_IN_HEADER

The HTTPRequestURI is the absolute path to the request URI, without the request string. If the HTTPRequestURI is empty, also keep a forward slash (/).

The hash-based message authentication code (HMAC) is created using the HMAC-SHA256 protocol and the signature is calculated.

  1. StringToSign as the signature string, secretKey as the secret key and SHA256 as the hash algorithm.

For more information about HMAC, see: https://tools.ietf.org/html/rfc2104.

  1. Converting the results of the previous step to a BASE64 string

  2. Put the BASE64 string into the Authorization of HTTP request Header

Example of signature

Below is an example of appId and secretKey.

appId=81900001
secrectKey=****

Below is an example of request body.

{"text":"Hello, welcome to LiveData.","language":"zh-CN","voice":{"name":"juvenile"},"output":{"format":"mp3"}}

Generate CanonicalizedQueryString

4625d9e60b1fce4a6b4a01fb3ba8e7f33cd2751587ce79e4dc29e6d38b1fb1e9

Generate StringToSign

POST
tts.ilivedata.com
/api/v1/speech/synthesis/stream
4625d9e60b1fce4a6b4a01fb3ba8e7f33cd2751587ce79e4dc29e6d38b1fb1e9
X-AppId:81900001
X-TimeStamp:2024-11-01T07:59:59Z

Signatures from HMAC calculations

1nNkKezG9XgkbCau9aENhDDRJhoTMHAI85NnjY+Mm4k=

HTTP Response

Content-Type: application/octet-stream

The result is returned as a binary audio stream instead of JSON. Please refer to the following headers and examples.

Success Response Headers

Header Type Description
Content-Type String application/octet-stream
Cache-Control String no-store
X-Task-Id String Unique task identifier
X-Audio-Format String Output audio format
Transfer-Encoding String chunked

Response Sample

Sample Response Headers

HTTP/1.1 200 OK
Content-Type: application/octet-stream
Cache-Control: no-store
X-Task-Id: bj_367a9314e0294cf1afcaee5c93b58129
X-Audio-Format: mp3
Transfer-Encoding: chunked

The response body is a binary audio stream, such as chunked mp3, wav, pcm, or opus data. The client should receive it in binary mode and then play or save it.

Error Response

For scenarios such as invalid request parameters, nonexistent applications, unsupported languages, or unsupported output formats, the interface returns a non-200 response. In failure cases, the response body is JSON.

Field Name Type Description
errorCode Number Error code
errorMessage String Error message

Error Response Example

Sample Error Response

{
    "errorCode": 3003,
    "errorMessage": "Invalid voice name."
}

Call Example

curl -N -X POST 'https://tts.ilivedata.com/api/v1/speech/synthesis/stream' \
  -H 'Content-Type: application/json' \
  -H 'Accept: application/octet-stream' \
  -H 'X-AppId: 81900001' \
  -d '{"text":"Hello, this is a streaming speech synthesis example.","language":"zh-CN","output":{"format":"mp3"}}' \
  --output stream_audio.mp3 -D headers.txt