iOS

Overview

This document will guide developer using the real-time voice translation SDK (RTVT) from LiveData, to implement translation services in real-time voice scenarios.

Integration

Service Activation

To integration the LiveData RTVT SDK, you need to register an personal/enterprise account from LiveData official website (https://www.ilivedata.com/) , create a real-time voice translation service project and update the corresponding project parameters in SDK.

Version Support

iOS 12.0 visionOS 1.0 and above.

Requirements

Audio format : Support PCM OPUS audio format
Sampling rate: 16KHz
Encoding: 16-bit depth
Channels: Monaural

Configuring build settings

Setting Linker Flags
- Go to the “Build Settings” tab under your project’s “TARGETS”.
- Add the “-ObjC” flag to the “Other Linker Flags” section.
- Make sure to do this in the “ALL” configurations view.
- Ensure that the “O” and “C” in “-ObjC” are capitalized and the preceding hyphen “-” is included.
Ensuring Support for Objective-C++
- Your project needs to have at least one source file with a “.mm” extension to support Objective-C++.
- If not present, you can rename an existing “.m” file to “.mm”.
Adding Libraries
- Add the “libresolv.9.tbd” library to your project.
- This is typically done in the “Link Binary With Libraries” section.

Initialization

+ (nullable instancetype)clientWithEndpoint:(nonnull NSString * )endpoint
                                   projectId:(int64_t)projectId
                                    delegate:(id <RTVTProtocol>)delegate;

Parameter	Type	M/O	Description
endpoint	string	M	endpoint （check from LiveData Console service configuration）
projcetId	int64	M	project id
delgate	-	-	For delgate, refer to the following content for inclusion

RTVTProtocol delegate


/// translatedResult
/// - Parameters:
///   - streamId: translation stream id
///   - startTs: ms timestamp of voice starts
///   - endTs: ms timestamp of voice ends
///   - result: translated text
///   - language: target language
///   - recTs: ms timestamp of voice recognition
-(void)translatedResultWithStreamId:(int64_t)streamId
                            startTs:(int64_t)startTs
                              endTs:(int64_t)endTs
                             result:(NSString * _Nullable)result
                           language:(NSString * _Nullable)language
                              recTs:(int64_t)recTs
                             taskId:(int64_t)taskId;

 /// recognizedResult
 /// - Parameters:
 ///   - streamId: translation stream id                            
 ///   - startTs: ms timestamp of voice starts
 ///   - endTs: ms timestamp of voice ends
 ///   - result: recognized text
 ///   - language: source language
 ///   - recTs: ms timestamp of voice recognition
-(void)recognizedResultWithStreamId:(int64_t)streamId
                            startTs:(int64_t)startTs
                              endTs:(int64_t)endTs
                             result:(NSString * _Nullable)result
                           language:(NSString * _Nullable)language
                              recTs:(int64_t)recTs
                             taskId:(int64_t)taskId;


 /// translatedTempResult
 /// - Parameters:
 ///   - streamId: translation stream id                            
 ///   - startTs: ms timestamp of voice starts
 ///   - endTs: ms timestamp of voice ends
 ///   - result: translated temporary text
 ///   - language: target language
 ///   - recTs: ms timestamp of voice recognition
 -(void)translatedTmpResultWithStreamId:(int64_t)streamId
                               startTs:(int64_t)startTs
                                 endTs:(int64_t)endTs
                                result:(NSString * _Nullable)result
                              language:(NSString * _Nullable)language
                                 recTs:(int64_t)recTs
                                taskId:(int64_t)taskId;

 /// recognizedTempResult
 /// - Parameters:
 ///   - streamId: translation stream id                               
 ///   - startTs: ms timestamp of voice starts
 ///   - endTs: ms timestamp of voice ends
 ///   - result: recognized temporary text
 ///   - language: source language
 ///   - recTs: ms timestamp of voice recognition
-(void)recognizedTmpResultWithStreamId:(int64_t)streamId
                               startTs:(int64_t)startTs
                                 endTs:(int64_t)endTs
                                result:(NSString * _Nullable)result
                              language:(NSString * _Nullable)language
                                 recTs:(int64_t)recTs
                                taskId:(int64_t)taskId;

 /// voiceTranslation
 /// - Parameters:
 ///   - streamId: translation stream id                               
 ///   - text: translation result
 ///   - data: result audio data mp3 mono 16000hz
 ///   - language: target language
-(void)ttsResultWithStreamId:(int64_t)streamId
                        text:(NSString * _Nullable)text
                        data:(NSData*)data
                    language:(NSString * _Nullable)language;

Notice: you can find method of automatic reconnection from RTVTProtocol, so there is no need to concern about loss of connection.

 - (void)loginWithToken:(nonnull NSString *)token
                     ts:(int64_t)ts
                success:(RTVTLoginSuccessCallBack)loginSuccess
            connectFail:(RTVTLoginFailCallBack)loginFail;

Parameter	Type	M/O	Description
token	string	M	token generation using key from LiveData Console service configuration
ts	int64	M	token reference timestamp

Start translate

 -(void)startStreamTranslateWithAsrResult:(BOOL)asrResult
                            transResult:(BOOL)transResult
                             tempResult:(BOOL)tempResult
                              ttsResult:(BOOL)ttsResult
                             ttsSpeaker:(NSString * _Nullable)ttsSpeaker
                                 userId:(NSString * _Nullable)userId
                            srcLanguage:(nonnull NSString *)srcLanguage
                           destLanguage:(nonnull NSString *)destLanguage
                         srcAltLanguage:(NSArray <NSString*> * _Nullable) srcAltLanguage
                              codecType:(RTVTAudioDataCodecType)codecType
                              attribute:(NSString * _Nullable)attribute
                                success:(void(^)(int64_t streamId))successCallback
                                   fail:(RTVTAnswerFailCallBack)failCallback;

Parameter	Type	M/O	Description
asrResult	bool	M	Set whether speech recognition result is needed
transResult	bool	M	Set whether translation result is needed
tempResult	bool	M	Set whether temporary result is needed
ttsResult	bool	M	Set whether speech result is needed
ttsSpeaker	string	O	Set the voice style
userId	string	O	User ID, business side can pass as needed
ttsResult	bool	O	Translated speech, business side can pass as needed
srcLanguage	string	M	Source language
destLanguage	string	M	Target language, if only transcription is needed, an empty string can be passed
srcAltLanguage	array	O	Alternative language range for the source language, supports up to 3 languages
codecType	int	M	Encoding type of the uploaded data
attribute	string	O	Custom attribute
callback	-	-	the RTVT server will generate a streamId and callbacks to the SDK after successful operation

Start translate (Multiple Language)

-(void)multi_startTranslateWithAsrResult:(BOOL)asrResult
                             tempResult:(BOOL)tempResult
                                 userId:(NSString * _Nullable)userId
                            srcLanguage:(nonnull NSString *)srcLanguage
                         srcAltLanguage:(NSArray <NSString*> * _Nullable) srcAltLanguage
                                success:(void(^)(int64_t streamId))successCallback
                                   fail:(RTVTAnswerFailCallBack)failCallback;

Parameter	Type	M/O	Description
asrResult	bool	M	Set whether the final result of voice recognition is needed.
tempResult	bool	M	Set whether temporary results are needed.
userId	string	O	user id, input as needed.
srcLanguage	string	M	source language
srcAltLanguage	array	O	he range of alternative languages for the source language supports a maximum of three language types.
callback	-	-	the RTVT server will generate a streamId and callbacks to the SDK after successful operation.

Notice:
1.In scenarios where callback of recognition results is needed, asrResult should be set to true, srcLanguage is mandatory, and srcAltLanguage is optional.
2.In scenarios where callback of translation results is needed, transResult should be set to true, destLanguage is mandatory, and it cannot be an empty string.
3.If temporary recognition results and temporary translation results are needed, tempResult should be set to true.
4.If a language is passed into srcAltLanguage, the RTVT will default to a language recognition process first. The beginning part of the voice (about 3 seconds) will be used for language recognition, and the subsequent recognition/translation results will be returned normally.
5.If the language passed is not within the range of supported languages, a error message indicating “language not supported” will be displayed; if the language passed is not enabled in the project, a message indicating “project does not support” will be displayed.

Send voice clip

 -(void)sendVoiceWithStreamId:(int64_t)streamId
                    voiceData:(nonnull NSData*)voiceData
                          seq:(int64_t)seq
                           ts:(int64_t)ts
                      success:(RTVTAnswerSuccessCallBack)successCallback
                         fail:(RTVTAnswerFailCallBack)failCallback;

Parameter	Type	M/O	Description
streamId	int64_t	M	stream ID
seq	int64_t	M	audio segment sequence number (preferably in order)
voiceData	byte	M	audio data，default of 640 bytes
ts	int64_t	M	audio frame reference timestamp

Send voice clip (Multiple Language)

 -(void)multi_sendVoiceWithStreamId:(int64_t)streamId
                          voiceData:(nonnull NSData*)voiceData
                      destLanguages:(NSArray<NSString*>*)destLanguages
                                seq:(int64_t)seq
                                 ts:(int64_t)ts
                            success:(RTVTAnswerSuccessCallBack)successCallback
                               fail:(RTVTAnswerFailCallBack)failCallback;

Parameter	Type	M/O	Description
streamId	int64_t	M	stream ID
seq	int64_t	M	audio segment sequence number (preferably in order)
destLanguage	string	M	target language
voiceData	byte	M	audio data，default of 640 bytes
ts	int64_t	M	audio frame reference timestamp

Notice： If no voice data is sent for a certain period of time, the RTVT will perform a timeout process. At this point, it is necessary to call the starStreamTranslateWithAsrResult method or multi_starTranslateWithAsrResult method again to obtain a new streamId.

Stop translate

 -(void)endTranslateWithStreamId:(int)streamId
                        lastSeq:(int)lastSeq
                        success:(RTVTAnswerSuccessCallBack)successCallback
                           fail:(RTVTAnswerFailCallBack)failCallback;

Parameter	Type	M/O	Description
streamId	int	M	stream Id need to translate
lastSeq	int	M	the final audio frame

Close RTVT

- (BOOL)closeConnect;

Error code

Error code	Description
800000	Unknown error
800002	Unverified link
800003	Invalid parameter
800101	Invalid system time
800102	Invalid token, invalid encoding
800103	Invalid pid
800105	Unsupported language
800106	Too many alternative languages
800107	Translation stream reached the limit
800200	StreamId does not exist

More information

For SDK download and more information, please go to Github.