iOS
Update:
Overview
This document will guide developer using the real-time voice translation SDK (RTVT) from LiveData, to implement translation services in real-time voice scenarios.
Integration
Service Activation
To integration the LiveData RTVT SDK, you need to register an personal/enterprise account from LiveData official website (https://www.ilivedata.com/) , create a real-time voice translation service project and update the corresponding project parameters in SDK.
Version Support
iOS 12.0 visionOS 1.0 and above.
Requirements
- Audio format : Support PCM OPUS audio format
- Sampling rate: 16KHz
- Encoding: 16-bit depth
- Channels: Monaural
Configuring build settings
- Setting Linker Flags
- Go to the “Build Settings” tab under your project’s “TARGETS”.
- Add the “-ObjC” flag to the “Other Linker Flags” section.
- Make sure to do this in the “ALL” configurations view.
- Ensure that the “O” and “C” in “-ObjC” are capitalized and the preceding hyphen “-” is included.
- Ensuring Support for Objective-C++
- Your project needs to have at least one source file with a “.mm” extension to support Objective-C++.
- If not present, you can rename an existing “.m” file to “.mm”.
- Adding Libraries
- Add the “libresolv.9.tbd” library to your project.
- This is typically done in the “Link Binary With Libraries” section.
Initialization
+ (nullable instancetype)clientWithEndpoint:(nonnull NSString * )endpoint
projectId:(int64_t)projectId
delegate:(id <RTVTProtocol>)delegate;
Parameter
Type
M/O
Description
endpoint
string
M
endpoint (check from LiveData Console service configuration)
projcetId
int64
M
project id
delgate
-
-
For delgate, refer to the following content for inclusion
RTVTProtocol delegate
/// translatedResult
/// - Parameters:
/// - streamId: translation stream id
/// - startTs: ms timestamp of voice starts
/// - endTs: ms timestamp of voice ends
/// - result: translated text
/// - language: target language
/// - recTs: ms timestamp of voice recognition
-(void)translatedResultWithStreamId:(int64_t)streamId
startTs:(int64_t)startTs
endTs:(int64_t)endTs
result:(NSString * _Nullable)result
language:(NSString * _Nullable)language
recTs:(int64_t)recTs
taskId:(int64_t)taskId;
/// recognizedResult
/// - Parameters:
/// - streamId: translation stream id
/// - startTs: ms timestamp of voice starts
/// - endTs: ms timestamp of voice ends
/// - result: recognized text
/// - language: source language
/// - recTs: ms timestamp of voice recognition
-(void)recognizedResultWithStreamId:(int64_t)streamId
startTs:(int64_t)startTs
endTs:(int64_t)endTs
result:(NSString * _Nullable)result
language:(NSString * _Nullable)language
recTs:(int64_t)recTs
taskId:(int64_t)taskId;
/// translatedTempResult
/// - Parameters:
/// - streamId: translation stream id
/// - startTs: ms timestamp of voice starts
/// - endTs: ms timestamp of voice ends
/// - result: translated temporary text
/// - language: target language
/// - recTs: ms timestamp of voice recognition
-(void)translatedTmpResultWithStreamId:(int64_t)streamId
startTs:(int64_t)startTs
endTs:(int64_t)endTs
result:(NSString * _Nullable)result
language:(NSString * _Nullable)language
recTs:(int64_t)recTs
taskId:(int64_t)taskId;
/// recognizedTempResult
/// - Parameters:
/// - streamId: translation stream id
/// - startTs: ms timestamp of voice starts
/// - endTs: ms timestamp of voice ends
/// - result: recognized temporary text
/// - language: source language
/// - recTs: ms timestamp of voice recognition
-(void)recognizedTmpResultWithStreamId:(int64_t)streamId
startTs:(int64_t)startTs
endTs:(int64_t)endTs
result:(NSString * _Nullable)result
language:(NSString * _Nullable)language
recTs:(int64_t)recTs
taskId:(int64_t)taskId;
/// voiceTranslation
/// - Parameters:
/// - streamId: translation stream id
/// - text: translation result
/// - data: result audio data mp3 mono 16000hz
/// - language: target language
-(void)ttsResultWithStreamId:(int64_t)streamId
text:(NSString * _Nullable)text
data:(NSData*)data
language:(NSString * _Nullable)language;
Notice: you can find method of automatic reconnection
from RTVTProtocol, so there is no need to concern about loss of connection.
Login
- (void)loginWithToken:(nonnull NSString *)token
ts:(int64_t)ts
success:(RTVTLoginSuccessCallBack)loginSuccess
connectFail:(RTVTLoginFailCallBack)loginFail;
Parameter
Type
M/O
Description
token
string
M
token generation using key from LiveData Console service configuration
ts
int64
M
token reference timestamp
Start translate
-(void)startStreamTranslateWithAsrResult:(BOOL)asrResult
transResult:(BOOL)transResult
tempResult:(BOOL)tempResult
ttsResult:(BOOL)ttsResult
ttsSpeaker:(NSString * _Nullable)ttsSpeaker
userId:(NSString * _Nullable)userId
srcLanguage:(nonnull NSString *)srcLanguage
destLanguage:(nonnull NSString *)destLanguage
srcAltLanguage:(NSArray <NSString*> * _Nullable) srcAltLanguage
codecType:(RTVTAudioDataCodecType)codecType
attribute:(NSString * _Nullable)attribute
success:(void(^)(int64_t streamId))successCallback
fail:(RTVTAnswerFailCallBack)failCallback;
Parameter
Type
M/O
Description
asrResult
bool
M
Set whether speech recognition result is needed
transResult
bool
M
Set whether translation result is needed
tempResult
bool
M
Set whether temporary result is needed
ttsResult
bool
M
Set whether speech result is needed
ttsSpeaker
string
O
Set the voice style
userId
string
O
User ID, business side can pass as needed
ttsResult
bool
O
Translated speech, business side can pass as needed
srcLanguage
string
M
Source language
destLanguage
string
M
Target language, if only transcription is needed, an empty string can be passed
srcAltLanguage
array
O
Alternative language range for the source language, supports up to 3 languages
codecType
int
M
Encoding type of the uploaded data
attribute
string
O
Custom attribute
callback
-
-
the RTVT server will generate a streamId and callbacks to the SDK after successful operation
Start translate (Multiple Language)
-(void)multi_startTranslateWithAsrResult:(BOOL)asrResult
tempResult:(BOOL)tempResult
userId:(NSString * _Nullable)userId
srcLanguage:(nonnull NSString *)srcLanguage
srcAltLanguage:(NSArray <NSString*> * _Nullable) srcAltLanguage
success:(void(^)(int64_t streamId))successCallback
fail:(RTVTAnswerFailCallBack)failCallback;
Parameter
Type
M/O
Description
asrResult
bool
M
Set whether the final result of voice recognition is needed.
tempResult
bool
M
Set whether temporary results are needed.
userId
string
O
user id, input as needed.
srcLanguage
string
M
source language
srcAltLanguage
array
O
he range of alternative languages for the source language supports a maximum of three language types.
callback
-
-
the RTVT server will generate a streamId and callbacks to the SDK after successful operation.
Notice:
1.In scenarios where callback of recognition results is needed, asrResult
should be set to true, srcLanguage
is mandatory, and srcAltLanguage
is optional.
2.In scenarios where callback of translation results is needed, transResult
should be set to true, destLanguage
is mandatory, and it cannot be an empty string.
3.If temporary recognition results and temporary translation results are needed, tempResult
should be set to true.
4.If a language is passed into srcAltLanguage
, the RTVT will default to a language recognition process first. The beginning part of the voice (about 3 seconds) will be used for language recognition, and the subsequent recognition/translation results will be returned normally.
5.If the language passed is not within the range of supported languages, a error message indicating “language not supported” will be displayed; if the language passed is not enabled in the project, a message indicating “project does not support” will be displayed.
Send voice clip
-(void)sendVoiceWithStreamId:(int64_t)streamId
voiceData:(nonnull NSData*)voiceData
seq:(int64_t)seq
ts:(int64_t)ts
success:(RTVTAnswerSuccessCallBack)successCallback
fail:(RTVTAnswerFailCallBack)failCallback;
Parameter
Type
M/O
Description
streamId
int64_t
M
stream ID
seq
int64_t
M
audio segment sequence number (preferably in order)
voiceData
byte
M
audio data,default of 640 bytes
ts
int64_t
M
audio frame reference timestamp
Send voice clip (Multiple Language)
-(void)multi_sendVoiceWithStreamId:(int64_t)streamId
voiceData:(nonnull NSData*)voiceData
destLanguages:(NSArray<NSString*>*)destLanguages
seq:(int64_t)seq
ts:(int64_t)ts
success:(RTVTAnswerSuccessCallBack)successCallback
fail:(RTVTAnswerFailCallBack)failCallback;
Parameter
Type
M/O
Description
streamId
int64_t
M
stream ID
seq
int64_t
M
audio segment sequence number (preferably in order)
destLanguage
string
M
target language
voiceData
byte
M
audio data,default of 640 bytes
ts
int64_t
M
audio frame reference timestamp
Notice: If no voice data is sent for a certain period of time, the RTVT will perform a timeout process. At this point, it is necessary to call the starStreamTranslateWithAsrResult
method or multi_starTranslateWithAsrResult
method again to obtain a new streamId.
Stop translate
-(void)endTranslateWithStreamId:(int)streamId
lastSeq:(int)lastSeq
success:(RTVTAnswerSuccessCallBack)successCallback
fail:(RTVTAnswerFailCallBack)failCallback;
Parameter
Type
M/O
Description
streamId
int
M
stream Id need to translate
lastSeq
int
M
the final audio frame
Close RTVT
- (BOOL)closeConnect;
Error code
Error code
Description
800000
Unknown error
800002
Unverified link
800003
Invalid parameter
800101
Invalid system time
800102
Invalid token, invalid encoding
800103
Invalid pid
800105
Unsupported language
800106
Too many alternative languages
800107
Translation stream reached the limit
800200
StreamId does not exist
More information
For SDK download and more information, please go to Github.
Overview
This document will guide developer using the real-time voice translation SDK (RTVT) from LiveData, to implement translation services in real-time voice scenarios.
Integration
Service Activation
To integration the LiveData RTVT SDK, you need to register an personal/enterprise account from LiveData official website (https://www.ilivedata.com/) , create a real-time voice translation service project and update the corresponding project parameters in SDK.
Version Support
iOS 12.0 visionOS 1.0 and above.
Requirements
- Audio format : Support PCM OPUS audio format
- Sampling rate: 16KHz
- Encoding: 16-bit depth
- Channels: Monaural
Configuring build settings
- Setting Linker Flags
- Go to the “Build Settings” tab under your project’s “TARGETS”.
- Add the “-ObjC” flag to the “Other Linker Flags” section.
- Make sure to do this in the “ALL” configurations view.
- Ensure that the “O” and “C” in “-ObjC” are capitalized and the preceding hyphen “-” is included.
- Ensuring Support for Objective-C++
- Your project needs to have at least one source file with a “.mm” extension to support Objective-C++.
- If not present, you can rename an existing “.m” file to “.mm”.
- Adding Libraries
- Add the “libresolv.9.tbd” library to your project.
- This is typically done in the “Link Binary With Libraries” section.
Initialization
+ (nullable instancetype)clientWithEndpoint:(nonnull NSString * )endpoint
projectId:(int64_t)projectId
delegate:(id <RTVTProtocol>)delegate;
Parameter | Type | M/O | Description |
---|---|---|---|
endpoint | string | M | endpoint (check from LiveData Console service configuration) |
projcetId | int64 | M | project id |
delgate | - | - | For delgate, refer to the following content for inclusion |
RTVTProtocol delegate
/// translatedResult
/// - Parameters:
/// - streamId: translation stream id
/// - startTs: ms timestamp of voice starts
/// - endTs: ms timestamp of voice ends
/// - result: translated text
/// - language: target language
/// - recTs: ms timestamp of voice recognition
-(void)translatedResultWithStreamId:(int64_t)streamId
startTs:(int64_t)startTs
endTs:(int64_t)endTs
result:(NSString * _Nullable)result
language:(NSString * _Nullable)language
recTs:(int64_t)recTs
taskId:(int64_t)taskId;
/// recognizedResult
/// - Parameters:
/// - streamId: translation stream id
/// - startTs: ms timestamp of voice starts
/// - endTs: ms timestamp of voice ends
/// - result: recognized text
/// - language: source language
/// - recTs: ms timestamp of voice recognition
-(void)recognizedResultWithStreamId:(int64_t)streamId
startTs:(int64_t)startTs
endTs:(int64_t)endTs
result:(NSString * _Nullable)result
language:(NSString * _Nullable)language
recTs:(int64_t)recTs
taskId:(int64_t)taskId;
/// translatedTempResult
/// - Parameters:
/// - streamId: translation stream id
/// - startTs: ms timestamp of voice starts
/// - endTs: ms timestamp of voice ends
/// - result: translated temporary text
/// - language: target language
/// - recTs: ms timestamp of voice recognition
-(void)translatedTmpResultWithStreamId:(int64_t)streamId
startTs:(int64_t)startTs
endTs:(int64_t)endTs
result:(NSString * _Nullable)result
language:(NSString * _Nullable)language
recTs:(int64_t)recTs
taskId:(int64_t)taskId;
/// recognizedTempResult
/// - Parameters:
/// - streamId: translation stream id
/// - startTs: ms timestamp of voice starts
/// - endTs: ms timestamp of voice ends
/// - result: recognized temporary text
/// - language: source language
/// - recTs: ms timestamp of voice recognition
-(void)recognizedTmpResultWithStreamId:(int64_t)streamId
startTs:(int64_t)startTs
endTs:(int64_t)endTs
result:(NSString * _Nullable)result
language:(NSString * _Nullable)language
recTs:(int64_t)recTs
taskId:(int64_t)taskId;
/// voiceTranslation
/// - Parameters:
/// - streamId: translation stream id
/// - text: translation result
/// - data: result audio data mp3 mono 16000hz
/// - language: target language
-(void)ttsResultWithStreamId:(int64_t)streamId
text:(NSString * _Nullable)text
data:(NSData*)data
language:(NSString * _Nullable)language;
Notice: you can find method of automatic reconnection
from RTVTProtocol, so there is no need to concern about loss of connection.
Login
- (void)loginWithToken:(nonnull NSString *)token
ts:(int64_t)ts
success:(RTVTLoginSuccessCallBack)loginSuccess
connectFail:(RTVTLoginFailCallBack)loginFail;
Parameter | Type | M/O | Description |
---|---|---|---|
token | string | M | token generation using key from LiveData Console service configuration |
ts | int64 | M | token reference timestamp |
Start translate
-(void)startStreamTranslateWithAsrResult:(BOOL)asrResult
transResult:(BOOL)transResult
tempResult:(BOOL)tempResult
ttsResult:(BOOL)ttsResult
ttsSpeaker:(NSString * _Nullable)ttsSpeaker
userId:(NSString * _Nullable)userId
srcLanguage:(nonnull NSString *)srcLanguage
destLanguage:(nonnull NSString *)destLanguage
srcAltLanguage:(NSArray <NSString*> * _Nullable) srcAltLanguage
codecType:(RTVTAudioDataCodecType)codecType
attribute:(NSString * _Nullable)attribute
success:(void(^)(int64_t streamId))successCallback
fail:(RTVTAnswerFailCallBack)failCallback;
Parameter | Type | M/O | Description |
---|---|---|---|
asrResult | bool | M | Set whether speech recognition result is needed |
transResult | bool | M | Set whether translation result is needed |
tempResult | bool | M | Set whether temporary result is needed |
ttsResult | bool | M | Set whether speech result is needed |
ttsSpeaker | string | O | Set the voice style |
userId | string | O | User ID, business side can pass as needed |
ttsResult | bool | O | Translated speech, business side can pass as needed |
srcLanguage | string | M | Source language |
destLanguage | string | M | Target language, if only transcription is needed, an empty string can be passed |
srcAltLanguage | array | O | Alternative language range for the source language, supports up to 3 languages |
codecType | int | M | Encoding type of the uploaded data |
attribute | string | O | Custom attribute |
callback | - | - | the RTVT server will generate a streamId and callbacks to the SDK after successful operation |
Start translate (Multiple Language)
-(void)multi_startTranslateWithAsrResult:(BOOL)asrResult
tempResult:(BOOL)tempResult
userId:(NSString * _Nullable)userId
srcLanguage:(nonnull NSString *)srcLanguage
srcAltLanguage:(NSArray <NSString*> * _Nullable) srcAltLanguage
success:(void(^)(int64_t streamId))successCallback
fail:(RTVTAnswerFailCallBack)failCallback;
Parameter | Type | M/O | Description |
---|---|---|---|
asrResult | bool | M | Set whether the final result of voice recognition is needed. |
tempResult | bool | M | Set whether temporary results are needed. |
userId | string | O | user id, input as needed. |
srcLanguage | string | M | source language |
srcAltLanguage | array | O | he range of alternative languages for the source language supports a maximum of three language types. |
callback | - | - | the RTVT server will generate a streamId and callbacks to the SDK after successful operation. |
Notice:
1.In scenarios where callback of recognition results is needed, asrResult
should be set to true, srcLanguage
is mandatory, and srcAltLanguage
is optional.
2.In scenarios where callback of translation results is needed, transResult
should be set to true, destLanguage
is mandatory, and it cannot be an empty string.
3.If temporary recognition results and temporary translation results are needed, tempResult
should be set to true.
4.If a language is passed into srcAltLanguage
, the RTVT will default to a language recognition process first. The beginning part of the voice (about 3 seconds) will be used for language recognition, and the subsequent recognition/translation results will be returned normally.
5.If the language passed is not within the range of supported languages, a error message indicating “language not supported” will be displayed; if the language passed is not enabled in the project, a message indicating “project does not support” will be displayed.
Send voice clip
-(void)sendVoiceWithStreamId:(int64_t)streamId
voiceData:(nonnull NSData*)voiceData
seq:(int64_t)seq
ts:(int64_t)ts
success:(RTVTAnswerSuccessCallBack)successCallback
fail:(RTVTAnswerFailCallBack)failCallback;
Parameter | Type | M/O | Description |
---|---|---|---|
streamId | int64_t | M | stream ID |
seq | int64_t | M | audio segment sequence number (preferably in order) |
voiceData | byte | M | audio data,default of 640 bytes |
ts | int64_t | M | audio frame reference timestamp |
Send voice clip (Multiple Language)
-(void)multi_sendVoiceWithStreamId:(int64_t)streamId
voiceData:(nonnull NSData*)voiceData
destLanguages:(NSArray<NSString*>*)destLanguages
seq:(int64_t)seq
ts:(int64_t)ts
success:(RTVTAnswerSuccessCallBack)successCallback
fail:(RTVTAnswerFailCallBack)failCallback;
Parameter | Type | M/O | Description |
---|---|---|---|
streamId | int64_t | M | stream ID |
seq | int64_t | M | audio segment sequence number (preferably in order) |
destLanguage | string | M | target language |
voiceData | byte | M | audio data,default of 640 bytes |
ts | int64_t | M | audio frame reference timestamp |
Notice: If no voice data is sent for a certain period of time, the RTVT will perform a timeout process. At this point, it is necessary to call the starStreamTranslateWithAsrResult
method or multi_starTranslateWithAsrResult
method again to obtain a new streamId.
Stop translate
-(void)endTranslateWithStreamId:(int)streamId
lastSeq:(int)lastSeq
success:(RTVTAnswerSuccessCallBack)successCallback
fail:(RTVTAnswerFailCallBack)failCallback;
Parameter | Type | M/O | Description |
---|---|---|---|
streamId | int | M | stream Id need to translate |
lastSeq | int | M | the final audio frame |
Close RTVT
- (BOOL)closeConnect;
Error code
Error code | Description |
---|---|
800000 | Unknown error |
800002 | Unverified link |
800003 | Invalid parameter |
800101 | Invalid system time |
800102 | Invalid token, invalid encoding |
800103 | Invalid pid |
800105 | Unsupported language |
800106 | Too many alternative languages |
800107 | Translation stream reached the limit |
800200 | StreamId does not exist |
More information
For SDK download and more information, please go to Github.