iOS

Overview

This document will guide developer using the real-time voice translation SDK (RTVT) from LiveData, to implement translation services in real-time voice scenarios.

Integration

Service Activation

To integration the LiveData RTVT SDK, you need to register an personal/enterprise account from LiveData official website (https://www.ilivedata.com/) , create a real-time voice translation service project and update the corresponding project parameters in SDK.

Version Support

iOS 12.0 visionOS 1.0 and above.

Requirements

  • Audio format : Support PCM OPUS audio format
  • Sampling rate: 16KHz
  • Encoding: 16-bit depth
  • Channels: Monaural

Configuring build settings

  • Setting Linker Flags
    • Go to the “Build Settings” tab under your project’s “TARGETS”.
    • Add the “-ObjC” flag to the “Other Linker Flags” section.
    • Make sure to do this in the “ALL” configurations view.
    • Ensure that the “O” and “C” in “-ObjC” are capitalized and the preceding hyphen “-” is included.
  • Ensuring Support for Objective-C++
    • Your project needs to have at least one source file with a “.mm” extension to support Objective-C++.
    • If not present, you can rename an existing “.m” file to “.mm”.
  • Adding Libraries
    • Add the “libresolv.9.tbd” library to your project.
    • This is typically done in the “Link Binary With Libraries” section.

Initialization

+ (nullable instancetype)clientWithEndpoint:(nonnull NSString * )endpoint
                                   projectId:(int64_t)projectId
                                    delegate:(id <RTVTProtocol>)delegate;
Parameter Type M/O Description
endpoint string M endpoint (check from LiveData Console service configuration)
projcetId int64 M project id
delgate - - For delgate, refer to the following content for inclusion

RTVTProtocol delegate


/// translatedResult
/// - Parameters:
///   - streamId: translation stream id
///   - startTs: ms timestamp of voice starts
///   - endTs: ms timestamp of voice ends
///   - result: translated text
///   - language: target language
///   - recTs: ms timestamp of voice recognition
-(void)translatedResultWithStreamId:(int64_t)streamId
                            startTs:(int64_t)startTs
                              endTs:(int64_t)endTs
                             result:(NSString * _Nullable)result
                           language:(NSString * _Nullable)language
                              recTs:(int64_t)recTs
                             taskId:(int64_t)taskId;

 /// recognizedResult
 /// - Parameters:
 ///   - streamId: translation stream id                            
 ///   - startTs: ms timestamp of voice starts
 ///   - endTs: ms timestamp of voice ends
 ///   - result: recognized text
 ///   - language: source language
 ///   - recTs: ms timestamp of voice recognition
-(void)recognizedResultWithStreamId:(int64_t)streamId
                            startTs:(int64_t)startTs
                              endTs:(int64_t)endTs
                             result:(NSString * _Nullable)result
                           language:(NSString * _Nullable)language
                              recTs:(int64_t)recTs
                             taskId:(int64_t)taskId;


 /// translatedTempResult
 /// - Parameters:
 ///   - streamId: translation stream id                            
 ///   - startTs: ms timestamp of voice starts
 ///   - endTs: ms timestamp of voice ends
 ///   - result: translated temporary text
 ///   - language: target language
 ///   - recTs: ms timestamp of voice recognition
 -(void)translatedTmpResultWithStreamId:(int64_t)streamId
                               startTs:(int64_t)startTs
                                 endTs:(int64_t)endTs
                                result:(NSString * _Nullable)result
                              language:(NSString * _Nullable)language
                                 recTs:(int64_t)recTs
                                taskId:(int64_t)taskId;

 /// recognizedTempResult
 /// - Parameters:
 ///   - streamId: translation stream id                               
 ///   - startTs: ms timestamp of voice starts
 ///   - endTs: ms timestamp of voice ends
 ///   - result: recognized temporary text
 ///   - language: source language
 ///   - recTs: ms timestamp of voice recognition
-(void)recognizedTmpResultWithStreamId:(int64_t)streamId
                               startTs:(int64_t)startTs
                                 endTs:(int64_t)endTs
                                result:(NSString * _Nullable)result
                              language:(NSString * _Nullable)language
                                 recTs:(int64_t)recTs
                                taskId:(int64_t)taskId;

 /// voiceTranslation
 /// - Parameters:
 ///   - streamId: translation stream id                               
 ///   - text: translation result
 ///   - data: result audio data mp3 mono 16000hz
 ///   - language: target language
-(void)ttsResultWithStreamId:(int64_t)streamId
                        text:(NSString * _Nullable)text
                        data:(NSData*)data
                    language:(NSString * _Nullable)language;

Notice: you can find method of automatic reconnection from RTVTProtocol, so there is no need to concern about loss of connection.

Login

 - (void)loginWithToken:(nonnull NSString *)token
                     ts:(int64_t)ts
                success:(RTVTLoginSuccessCallBack)loginSuccess
            connectFail:(RTVTLoginFailCallBack)loginFail;
Parameter Type M/O Description
token string M token generation using key from LiveData Console service configuration
ts int64 M token reference timestamp

Start translate

 -(void)startStreamTranslateWithAsrResult:(BOOL)asrResult
                            transResult:(BOOL)transResult
                             tempResult:(BOOL)tempResult
                              ttsResult:(BOOL)ttsResult
                             ttsSpeaker:(NSString * _Nullable)ttsSpeaker
                                 userId:(NSString * _Nullable)userId
                            srcLanguage:(nonnull NSString *)srcLanguage
                           destLanguage:(nonnull NSString *)destLanguage
                         srcAltLanguage:(NSArray <NSString*> * _Nullable) srcAltLanguage
                              codecType:(RTVTAudioDataCodecType)codecType
                              attribute:(NSString * _Nullable)attribute
                                success:(void(^)(int64_t streamId))successCallback
                                   fail:(RTVTAnswerFailCallBack)failCallback;
Parameter Type M/O Description
asrResult bool M Set whether speech recognition result is needed
transResult bool M Set whether translation result is needed
tempResult bool M Set whether temporary result is needed
ttsResult bool M Set whether speech result is needed
ttsSpeaker string O Set the voice style
userId string O User ID, business side can pass as needed
ttsResult bool O Translated speech, business side can pass as needed
srcLanguage string M Source language
destLanguage string M Target language, if only transcription is needed, an empty string can be passed
srcAltLanguage array O Alternative language range for the source language, supports up to 3 languages
codecType int M Encoding type of the uploaded data
attribute string O Custom attribute
callback - - the RTVT server will generate a streamId and callbacks to the SDK after successful operation

Start translate (Multiple Language)

-(void)multi_startTranslateWithAsrResult:(BOOL)asrResult
                             tempResult:(BOOL)tempResult
                                 userId:(NSString * _Nullable)userId
                            srcLanguage:(nonnull NSString *)srcLanguage
                         srcAltLanguage:(NSArray <NSString*> * _Nullable) srcAltLanguage
                                success:(void(^)(int64_t streamId))successCallback
                                   fail:(RTVTAnswerFailCallBack)failCallback;                                
Parameter Type M/O Description
asrResult bool M Set whether the final result of voice recognition is needed.
tempResult bool M Set whether temporary results are needed.
userId string O user id, input as needed.
srcLanguage string M source language
srcAltLanguage array O he range of alternative languages for the source language supports a maximum of three language types.
callback - - the RTVT server will generate a streamId and callbacks to the SDK after successful operation.

Notice:
1.In scenarios where callback of recognition results is needed, asrResult should be set to true, srcLanguage is mandatory, and srcAltLanguage is optional.
2.In scenarios where callback of translation results is needed, transResult should be set to true, destLanguage is mandatory, and it cannot be an empty string.
3.If temporary recognition results and temporary translation results are needed, tempResult should be set to true.
4.If a language is passed into srcAltLanguage, the RTVT will default to a language recognition process first. The beginning part of the voice (about 3 seconds) will be used for language recognition, and the subsequent recognition/translation results will be returned normally.
5.If the language passed is not within the range of supported languages, a error message indicating “language not supported” will be displayed; if the language passed is not enabled in the project, a message indicating “project does not support” will be displayed.

Send voice clip

 -(void)sendVoiceWithStreamId:(int64_t)streamId
                    voiceData:(nonnull NSData*)voiceData
                          seq:(int64_t)seq
                           ts:(int64_t)ts
                      success:(RTVTAnswerSuccessCallBack)successCallback
                         fail:(RTVTAnswerFailCallBack)failCallback;
Parameter Type M/O Description
streamId int64_t M stream ID
seq int64_t M audio segment sequence number (preferably in order)
voiceData byte M audio data,default of 640 bytes
ts int64_t M audio frame reference timestamp

Send voice clip (Multiple Language)

 -(void)multi_sendVoiceWithStreamId:(int64_t)streamId
                          voiceData:(nonnull NSData*)voiceData
                      destLanguages:(NSArray<NSString*>*)destLanguages
                                seq:(int64_t)seq
                                 ts:(int64_t)ts
                            success:(RTVTAnswerSuccessCallBack)successCallback
                               fail:(RTVTAnswerFailCallBack)failCallback;
Parameter Type M/O Description
streamId int64_t M stream ID
seq int64_t M audio segment sequence number (preferably in order)
destLanguage string M target language
voiceData byte M audio data,default of 640 bytes
ts int64_t M audio frame reference timestamp

Notice: If no voice data is sent for a certain period of time, the RTVT will perform a timeout process. At this point, it is necessary to call the starStreamTranslateWithAsrResult method or multi_starTranslateWithAsrResult method again to obtain a new streamId.

Stop translate

 -(void)endTranslateWithStreamId:(int)streamId
                        lastSeq:(int)lastSeq
                        success:(RTVTAnswerSuccessCallBack)successCallback
                           fail:(RTVTAnswerFailCallBack)failCallback;
Parameter Type M/O Description
streamId int M stream Id need to translate
lastSeq int M the final audio frame

Close RTVT

- (BOOL)closeConnect;

Error code

Error code Description
800000 Unknown error
800002 Unverified link
800003 Invalid parameter
800101 Invalid system time
800102 Invalid token, invalid encoding
800103 Invalid pid
800105 Unsupported language
800106 Too many alternative languages
800107 Translation stream reached the limit
800200 StreamId does not exist

More information

For SDK download and more information, please go to Github.