Tuesday 5 September 2017

Watson Speech to Text with Node.js

Talking to a computer makes you feel like your living in the future. In this post I will show how using node.js to expand on the Watson Speech to Text (STT) example to improve the accuracy of the transcription.
As a base I will take the cognitive car demo and try write a STT system for that. Once Speech to text can correctly interpret what a person asks the car we will want to send those commands to The Watson Conversation Service Car demo. But I will deal with connecting STT to WCS in another post.
Log into bluemix and set up a STT service. Create new credentials and save them as you will use them in all the following programs. This username and password are used everywhere below.
Next assuming you already have node installed you also need the watson developer cloud SDK
npm install watson-developer-cloud --save
and the speech-to-text-utils
npm install watson-speech-to-text-utils -g

Get an audio file of the sorts of things you would say to this Car demo. I asked questions like the ones below

turn on the lights
where is the nearest restaurant
wipe the windscreen
When is the next petrol station
When is the next gas station
does the next gas station have rest rooms
How far are the next rest rooms
please play pop music
play rock music
play country music


I have made an audio file of that (I had a cold). You can get it here or record your own version.
In speech_to_text.v1.js in the sdk examples file put in the username and password you got in 1. above. And point at the sound file of commands from 3.


const speech_to_text = new SpeechToTextV1({
  username: 'INSERT YOUR USERNAME FOR THE SERVICE HERE',
  password: 'INSERT YOUR PASSWORD FOR THE SERVICE HERE'
});

fs.createReadStream(__dirname + '/DavidCarDemo.wav').pipe(recognizeStream);

Now look at transcription.txt. We want to improve the accuracy of this transcription. Now create a model that we will train so that the STT understands our speech better.

Running node createmodel.js on this code gives you a custom_id needed from here on.

{ "customization_id": "06da5480-915c-11e7-bed0-ef2634fd8461" }
5. If we tell STT to take in the car demo corpus. It learns from this the sort of words to expect the user to say to the system
watson-speech-to-text-utils set-credentials
and give it the username and password same as above.
watson-speech-to-text-utils customization-add-corpus
Will ask you for a conversation workspace. Give it car_workspace.json downloaded from the car demo. Adding a conversation workspace as a corpus improves accuracy of the words that are unusually common in our conversation.
Now we want to improve accuracy on words that are not common in the corpus. For example "Does the next" is heard as 'dostinex" currently.
watson-speech-to-text-utils corpus-add-words -i 06da5480-915c-11e7-bed0-ef2634fd8461
and give words.json which looks like


{
  "words": [{
    "display_as": "could",
    "sounds_like": [ "could" ],
    "word": "culd"
  }, {
    "display_as": "closeby",
    "sounds_like": [ "closeby" ],
    "word": "closeby"
  }, {
    "display_as": "does the next",
    "sounds_like": [ "does the next", "dostinex" ],
    "word": "does_the_next"
  }
]
}
The speech utils allow you to see what words you have in your model. Words added to not overwrite old ones so sometimes you need to delete old values. watson-speech-to-text-utils customization-list-words -i 06da5480-915c-11e7-bed0-ef2634fd8461
watson-speech-to-text-utils corpus-delete-word -i 06da5480-915c-11e7-bed0-ef2634fd8461 -w "word to remove"

Finally in transcribe.js you need to tell it to use the model we have just made


const params = {
  content_type: 'audio/wav',
  customization_id: '06da5480-915c-11e7-bed0-ef2634fd8461'
};
The code I have used is up on github here.
Now we get a more accurate transcription of the voice commands. By training on the Conversation corpus. Then fixing other common errors using a words.json file.


























1 comment: