Using Google speech API In Your iOS Application

(7 votes, average: 4.71 out of 5)

Pham Van Hoang, hoangk55cd@gmail.com, is the author of this article and he contributes to RobustTechHouse Blog

Introduction

Unfortunately, if you want to start an iOS application with speech-recognition , unlike Android which comes with native development kit as supported by Google, there are no official APIs supported by Apple at this time of writing.

If you have used Google services before, then you will know that the accuracy of Google’s speech recognition service is top notch. It is very accurate and supports online short utterances with no language model or vocabulary configuration. Sadly, there is no official Google Speech API support for iOS available, but there are work-arounds that we can deploy. You should note that this is available for development and personal use only.

Today, I am going to show you how to integrate Google Speech API in your application. In this article you’ll learn:

Google Speech API and how to request credentials key from Google.
How to integrate Google Speech API in your application.

[Video and Source Code]

Video: https://www.youtube.com/watch?v=O_i54tj7jv8

Source code: SpeechAPIExample

[Google Speech API]

Host: https://www.google.com/speech-api/v2/recognize

Method: Post

Input:

lang: any valid locale (en-us, nl-be, fr-fr, etc.)

key: credentials key from Google. You’ll see you to get key bellow.

app: optional

output: json

Data:

FLAC

16-bit PCM

Headers: Content-Type (Ex: Content-Type: audio/x-flac; rate=44100;). Make sure the rate in your header matches the sample rate you used for your audio capture.

You can find more about this API here from gillesdemey

Request credentials key from Google

First, make sure you are a member of chromium-dev@chromium.org . If not you can just subscribe to be chromium-dev and choose not to receive mail. The APIs are only visible to people subscribed to that group.
Make sure you are logged in with the Google account associated with the email address that you used to subscribe to chromium-dev.
Go to https://cloud.google.com/console
Click the blue Create Project button. And create your own project.
In a search box: Search for “Speech API” and enable the API.
Transfer to Credentials Key Screen , choose iOS platform and add credentials to your own project

Now you have the credentials key of your own.

Integrating Google Speech API in your application

Now that you have all information needed to use the API, you just need to record the speech and send to the services through the API.

You can do your own class to record and handle the response. If you find that it take too much of your time, you can see an example from this repository of mzeeshanid. However, this repository was deprecated and has some classes that are no longer needed. I have modified his repository just using SpeechToTextModule class. You can see the version I modified here.

Now, it’s time to integrate the module into your project.

Step1: You need to add SpeechToTextModule class and speex SDK to your project.

Step 2: Because this class is using non-arc, so make sure you mark the flag “-fno-objc-arc” in the header file of class SpeechToTextModule.

Step 3: Replace your credentials key on GOOGLE_SPEECH_TO_TEXT_KEY line in SpeechToTextModule.m file.

Step 4: Import SpeechToTextModule class and SpeechToTextModuleDelegate and create an instance.

Step 5: Create UI, in this project I just use a button to record/stop. And a background image behind the button to make an animation when recording the user speech (I have used an UIImage Category to display gif file).

Step 6: Handle record/stop action. When users tap button record, you need to start recording and also change the background button, start animating to notice user that your app is recording the speech.

Step 7: When user taps button again you need to stop recording and change the button background. The SpeechToTextModule class will send the data to Google server.

Step 8: Handle the data response in SpeechToTextModuleDelegate – – (BOOL)didReceiveVoiceResponse:(NSDictionary *)data by your own purposes.

You can see more in my example here: SpeechAPIExample. Hope you will find this post useful. If you have any questions, please leave the comments below. Thanks for reading.