On Friday I’ve stumbled upon a good article about the Web Speech API and I decided that I should play with this API as I’ve never used it before.
The Web Speech API is in the Unofficial Draft status and it’s implemented in Chrome and Safari only. The API consists of two parts: the speech recognition and the speech synthesis.
Speech Recognition (speech-to-text)
Speech recognition service is provided by the SpeechRecognition interface. In Google Chrome, you’d need to use the prefixed version webkitSpeechRecognition
. The basic usage of the API is quite trivial:
// creating an instance of SpeechRecognition interface
var recognition = new webkitSpeechRecognition();
recognition.onresult = function(event) {
// event.results contain the results of recognition
// each result has the transcript property
// which is a textual representation of the speech
if (event.results.length > 0) {
alert(event.results[0][0].transcript);
}
}
The methods of the webkitSpeechRecognition
interface are:
start()
- starts the recognitionstop()
- stops the recognitionabort()
- aborts the recognition immediately
Three important handlers:
onresult
- the recognition API passes the results(both interim and final results) to this handler. The value of theisFinal
attribute for final results equals true.onerror
- is called when an error happens.onend
- is called when the recognition ends.
The webkitSpeechRecognition
interface has several interesting parameters:
lang
- string, the language used for recognition. If not set, the language of the html document root will be used.continuous
- boolean, it defines how many results are provided. Ifcontinuous == false
, only one result will be provided. To get more results, the recognition should be started again.interimResults
- boolean, it defines whether interim results are returned. Interim results are not final and may not be accurate.
More event handlers and methods are here.
Speech Synthesis (text-to-speech)
Another half of the API provides the text-to-speech feature. It works like this:
speechSynthesis.speak(new SpeechSynthesisUtterance('Javascript for Ninja'));
Ember.js Speech Recognition Component
I think that the speech recognition should be disabled by default because it requires an approval by the user and, thus, if it’s activated in the background it may look suspicious to the user.
But I would like to provide a possibility to use speech recognition, if the user wants to. I see this as a widget with a possibility to activate the speech recognition.
First, I will write a simple app:
App = Ember.Application.create();
App.Router.map(function() {
// put your routes here
});
App.IndexController = Ember.ObjectController.extend({
speakBack: '', // a property to hold the recognized speech
actions: {
// this will speak the recognized text
// using another part of the Web Speech API
// https://dvcs.w3.org/hg/speech-api/raw-file/tip/webspeechapi.
speak: function() {
var utterance = new SpeechSynthesisUtterance(this.get('speakBack'));
utterance.onerror = function() {
console.log(arguments);
};
speechSynthesis.speak(utterance);
},
onResult: function(result) {
alert('Search this: ' + result);
this.set('speakBack', result);
}
}
});
The HTML for the app:
<!DOCTYPE html>
<html>
<body>
<script type="text/x-handlebars"></script>
<script type="text/x-handlebars" data-template-name="index"></script>
</body>
</html>
Now the implementation of the component to control the Web Speech API:
/**
* VoiceControlComponent uses Web Speech API to recognize speech
* Usage:
* {{voice-control onResult="onResult"}}
*/
App.VoiceControlComponent = Ember.Component.extend({
enabled: false, // whether recognition is enabled
speechRecognition: null, // the instance of webkitSpeechRecognition
language: 'en', // language to recognise
startRecognition: function() {
// prefixed SpeechRecognition object because it only works in Chrome
var speechRecognition = new webkitSpeechRecognition();
// not continuous to avoid delays
speechRecognition.continuous = false;
// only the final result
speechRecognition.interimResults = false;
// the recognition language
speechRecognition.lang = this.get('language');
// binding various handlers
speechRecognition.onresult = Ember.run.bind(this, this.onRecoginitionResult);
speechRecognition.onerror = Ember.run.bind(this, this.onRecognitionError);
speechRecognition.onend = Ember.run.bind(this, this.onRecognitionEnd);
// starting the recognition
speechRecognition.start();
},
onRecognitionEnd: function() {
this.set('enabled', false);
},
onRecognitionError: function() {
alert('Recognition error');
},
/**
* e is a SpeechRecognitionEvent
* https://dvcs.w3.org/hg/speech-api/raw-file/tip/webspeechapi.html#speechreco-event
*/
onRecoginitionResult: function(e) {
var result = '';
var resultNo = 0;
var alternativeNo = 0;
// we get the first alternative of the first result
result = e.results[resultNo][alternativeNo].transcript;
// report the result to the outside
this.sendAction('onResult', result);
},
onEnabledChange: function() {
if (this.get('enabled')) {
this.startRecognition();
}
}.observes('enabled'),
actions: {
toggle: function() {
this.toggleProperty('enabled');
}
}
});
And a simple template for the component:
<!-- VoiceControlComponent's template -->
<script type="text/x-handlebars" data-template-name="components/voice-control"></script>
The result looks like this:
Additional remarks
- since the web speech API is available in webkit only, it’s a good idea to check whether the user’s browser supports the API
- when the website is accessed over http, the permission to use the microphone will not be remembered by the browser and the user will be asked each time to allow using the microphone.
Thanks for reading. Please comment and subscribe!