Wednesday, 20 September 2017

Adding a speech interface to the Watson Conversation Service

The IBM Watson Conversation Service does a great job of providing an interface that closely resembles a conversation with a real human being. However, with the advent of products like the Amazon Echo, Microsoft Cortana and the Google Home, people increasingly prefer to interact with services by speaking rather than typing. Luckily IBM Watson also has Text to Speech and Speech to Text services. In this post we show how to hook these services together to provide a unified speech interface to Watson's capabilities.

In this blog we will build upon the existing SpeechToSpeech sample which takes text spoken in one language and then leverages Watson's machine translation service to speak it back to you in another language. You can try the application described here on Bluemix or access the code on GitHub to see how you can customise the code and/or deploy on your own server.

This application has only one page and it is quite simple from the user's point of view.
  • At the top there is some header text introducing the sample and telling users how to use it. 
  • The sample uses some browser audio interfaces that are only available in recent browser versions. If we detect that these features are not present we put up a message telling the user that they need to choose a more modern browser. Hopefully you won't ever see this message.
  • In the original sample there are two drop down selection boxes which allow you to specify the source and target language. We removed these drop downs since they are not relevant to our modified use case.
  • The next block of the UI gives the user a number of different ways to enter speech samples:
    • There is a button   which allows you to start capturing audio directly from the microphone. Whatever you say will be buffered and then passed directly to the transcription service. While capturing audio, the button changes colour to red and the icon changes  - this is a visual indication that recording is in progress. When you are finished talking, click the button again to stop audio capture.
    • If are working in a noisy environment or if you don't have a good quality microphone, it might be difficult for you to speak clearly to Watson. To help solve this problem we have provided you with some ample files hosted in the web app. To play one of these samples click on one of the buttons to play the associated file and use it as input.
    • If you have your own recording that you can click on the  button and select the file containing the audio input that you want to send to the speech-to-text service.
    • Last, but not least, you can drag and drop an audio file onto the page to have it instantly uploaded
  • The transcribed text is displayed on an input box (so you can see if Watson is hearing properly) and sent to either the translation service (in the original version) or the conversation service in our updated service. If there is a problem with the way your voice is being transcribed, see this previous article on how to improve it.
  • When we get a response from the conversation or translation service we place the received text on an output text box and we also call the text-to-speech service to read out the response and save you the bother of having to read.
I know that you want to understand what is going on under the covers so here is a brief overview:
  • The app.js file is the core of the web application. It implements the connections between the front end code that runs in the browser and the various Watson services. This involves establishing 3 back-end REST services. This indirection is needed because you don't want to include your service credentials in the code sent to the browser and because your browser's cross site script protections will prohibit you from making a direct call to the Watson service from your browser. The services are
    • /message - this REST service implements the interface to the Watson Conversation service. Every time we have a text utterance transcribed, we do a POST on this URL with a JSON payload like {"context":{...},"input":{"text":"<transcribed_text>"}}. The first time we call the service we specify an empty context {} and in each subsequent call we supply the context object that the server sent back to us the last time. This allows the server to keep track of the state of the conversation.
      Most conversation flows are programmed to give a trite greeting in response to the first message. To avoid spending time on this the client code sends initial blank message when the page loads to get this out of the way.
    • /synthesize - this REST service use used to convert the response into audio. All that this service does to convert a get on http://localhosts:3000/synthesize?voice=en-US_MichaelVoice&text=Some%20responsevoice=en-US_MichaelVoice&text=Some%20response into a get on the URL this will return a .wav file with the text "some response" being spoken in US English by the voice "Michael". 
    • /token - the speech to text transcription is an exception to the normal rule that your browser shouldn't connect directly to the Watson service. For performance reasons we chose to use the websocket interface to the speech to text service. At page load time, the browser will do a GET on this /token REST service and it will respond with a token code that can then be included in the URL used to open the websocket. After this, all sound information captured from the microphone (or read from a sample file) is sent via the websocket directly from the browser to the Watson speech to text service.
  • The index.html file is the UI that the user sees. 
    • As well as defining the main UI elements which appear on the page, it also  includes main.js which is the client side code that handles all interaction in your browser.
    • It also includes the JQuery and Bootstrap modules. But I won't cover these in detail.
  • You might want to have a closer look at the client side code which is contained in a file public/js/main.js:
    • The first 260 lines of code are concerned with how to capture audio from the client's microphone (if the user allows it - there are tight controls on when/if browser applications are allowed to capture audio). Some of the complexity of this code is due to the different ways that different browsers deal with audio. Hopefully it will become easier in the future. 
    • Regardless of what quality audio your computer is capable of tracking, we down sample it to 16bit, mono at 16 Khz because this is what the speech recognition is expecting.
    • Next we declare which language model we want to use for speech recognition. We have hardcoded this to a model named "en-GB_BroadbandModel" which is a model tuned to work with high fidelity captures of of speakers of UK English (sadly there is no language model available for Irish English). However, we have left in a few other language models commented out to make it easy for you if you want to change to another language. Consult the Watson documentation for a full list of language models available.
    • The handleFileUpload function deals with file uploads. Either file uploads which happen as a result of explicitly clicking on the "Select File" button or upload that happen as a result of a drag-and-drop event.
    • The initSocket function manages with the interface to the websicket that we use to communicate to/from the speech_to_text service. It declares that the showResult function should be called when a response is received. Since it is not always clear when a spaker is finnished talking, the text-to-speech can return several times. As a result the msg.results[0].final variable is used to deremine if the current transcription is final. If it is an intermediate result, we just update the resultsText field with what we heard. If it is the final result, the msg.results[0].alternatives[0].transcript variable is also used as the most likely transcription of what the user said and it is passed on to the converse function.
    • The converse function handles sending the detected text to the Watson Conversation Service (WCS) via the /message REST interface which was descibed above. When the service gives a response to the question, we pass it to the text-to-speech service via the TTS function and we write it on the response textarea so it can be read as well as listened to.
  • In addition there are many other files which control the look and feel of the web page, but won't be described in detail here e.g. 
    • Style sheets in the /public/css directory
    • Audio sample files in the /public/audio directory
    •  Images in the public/images directory
    • etc.
Anyone with a knowledge of how web applications work, should be able to figure out how it works. If you have any trouble, post your question as a comment on this blog.
At the time of writing, there is an instance of this application running at so you can see it running even if you are having trouble with your local deployment. However, I can't guarantee that this instance will stay running due to limits on mypersonal Bluemix account.

Friday, 15 September 2017

Translating a Chatbot

You have a trained chatbot built in English and your boss wants it working in German next week. Here is what I would do.

Tell her it is impossible. Building a new chatbot in a different language involves starting the whole process from scratch. Linguistic, cultural and company process reasons means a translated chatbot won't work.

Then I would build it anyway.

Create a language translation service in Bluemix. Note the username and password this service has.

Get the codes of the language you want to translate between. From English 'en' in Sept 2017 you can translate to Arabic 'ar', Brazilian Portuguese 'pt', French 'fr', German 'de', Italian 'it', Japanese 'ja', Korean 'ko', and Spanish 'es'.

Take your current Ground truth of Questions, Intentions. Put the Questions and intents in a spreadsheet. Sort the list in intentions. So that all the questions about a topic are together.

Now the python code to translate the questions into German is below. You need to use the username and password you set up earlier.

import json
from watson_developer_cloud import LanguageTranslatorV2 as LanguageTranslator
import csv

language_translator = LanguageTranslator(
  username= "",
  password= "")

text= ''
with open('myfile.csv', newline='') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',', quotechar='|')
    for row in spamreader:
        print(row[0], end=',')
        translation = language_translator.translate(text=row[0],
        a=(json.dumps(translation, indent=2, ensure_ascii=False))
myfile.csv here is
Am I allowed to speak in german,
Do you speak German?,
Do you speak English?,
What languages can I talk to you in?,
And this gives the output
davids-mbp:translate davidcur$ python
Am I allowed to speak in german,Bin ich konnte in Deutsch zu sprechen
Do you speak German?,Möchten Sie Deutsch sprechen?
Do you speak English?,Wollen Sie Englisch sprechen?
What languages can I talk to you in?,Welche Sprachen kann ich zu Ihnen sprechen?

Certain phrases and words you will want to translate in a non standard way. If the chatbot talks about a company called 'Exchange' you will want to warn the translator that it should not translate that into the German word for exchange. To do this you load a glossary file into your translator. glossary.tmx looks like this

and then run code to tell your translation service to use this glossary. In node.js this is

var watson = require('watson-developer-cloud');
var fs = require('fs');

var language_translator = watson.language_translator({
    version: 'v2',
  url: "",
  username: "",
  password: ""

var params = {
  name: 'custom-english-to-german',
  base_model_id: 'en-de',
  forced_glossary: fs.createReadStream('glossary.tmx')

  function(err, model) {
    if (err)
      console.log('error:', err);
      console.log(JSON.stringify(model, null, 2));

The terms in your glossary are likely to be in your Entities as most of your common proper nouns, industry terms and abbreviations end up in there.

Now get someone who speaks both English and the Target language fluently. Stick these translated questions in the spreadsheet. Go through each question and humanify the translation. This is a pretty quick process. They will occasionally come across words that will have to be added to the glossary.tmx file and the translations rerun.

At the end of this you have an attempt at a ground truth in another language for a chatbot. There are several reasons why this wont be perfect. Germans do not speak like translated British people. They have different weather, culture and laws. And the differences between Germany and Britain are smaller than between many countries.

Their questions are likely to be different. But as a first cut to get a chatbot up to the point where it might recognise a fair chunk of what people are saying. This can at least be used to help you gather new questions or to help you classify new questions you collect from actual Germans.

Manufacturing questions never really works well. As Simon pointed out here. And translation of questions is close to that. But it can be a useful tool to quickly get a chatbot up to the point where it can be bootstrapped into a full system.

Tuesday, 5 September 2017

Watson Speech to Text with Node.js

Talking to a computer makes you feel like your living in the future. In this post I will show how using node.js to expand on the Watson Speech to Text (STT) example to improve the accuracy of the transcription.

As a base I will take the cognitive car demo and try write a STT system for that. Once Speech to text can correctly interpret what a person asks the car we will want to send those commands to The Watson Conversation Service Car demo. But I will deal with connecting STT to WCS in another post.

Log into bluemix and set up a STT service. Create new credentials and save them as you will use them in all the following programs. This username and password are used everywhere below.

Next assuming you already have node installed you also need the watson developer cloud SDK

npm install watson-developer-cloud --save

and the speech-to-text-utils

npm install watson-speech-to-text-utils -g

Get an audio file of the sorts of things you would say to this Car demo. I asked questions like the ones below

turn on the lights
where is the nearest restaurant
wipe the windscreen
When is the next petrol station
When is the next gas station
does the next gas station have rest rooms
How far are the next rest rooms
please play pop music
play rock music
play country music

I have made an audio file of that (I had a cold). You can get it here or record your own version.

In speech_to_text.v1.js in the sdk examples file put in the username and password you got in 1. above. And point at the sound file of commands from 3.

const speech_to_text = new SpeechToTextV1({

fs.createReadStream(__dirname + '/DavidCarDemo.wav').pipe(recognizeStream);

Now look at transcription.txt. We want to improve the accuracy of this transcription.

Now create a model that we will train so that the STT understands our speech better.

Running node createmodel.js on this code gives you a custom_id needed from here on.

{ "customization_id": "06da5480-915c-11e7-bed0-ef2634fd8461" }

5. If we tell STT to take in the car demo corpus. It learns from this the sort of words to expect the user to say to the system

watson-speech-to-text-utils set-credentials

and give it the username and password same as above.

watson-speech-to-text-utils customization-add-corpus

Will ask you for a conversation workspace. Give it car_workspace.json downloaded from the car demo. Adding a conversation workspace as a corpus improves accuracy of the words that are unusually common in our conversation.

Now we want to improve accuracy on words that are not common in the corpus. For example "Does the next" is heard as 'dostinex" currently.

watson-speech-to-text-utils corpus-add-words -i 06da5480-915c-11e7-bed0-ef2634fd8461

and give words.json which looks like

  "words": [{
    "display_as": "could",
    "sounds_like": [ "could" ],
    "word": "culd"
  }, {
    "display_as": "closeby",
    "sounds_like": [ "closeby" ],
    "word": "closeby"
  }, {
    "display_as": "does the next",
    "sounds_like": [ "does the next", "dostinex" ],
    "word": "does_the_next"
The speech utils allow you to see what words you have in your model. Words added to not overwrite old ones so sometimes you need to delete old values.

watson-speech-to-text-utils customization-list-words -i 06da5480-915c-11e7-bed0-ef2634fd8461

watson-speech-to-text-utils corpus-delete-word -i 06da5480-915c-11e7-bed0-ef2634fd8461 -w "word to remove"

Finally in transcribe.js you need to tell it to use the model we have just made

const params = {
  content_type: 'audio/wav',
  customization_id: '06da5480-915c-11e7-bed0-ef2634fd8461'

The code I have used is up on github here.

Now we get a more accurate transcription of the voice commands. By training on the Conversation corpus. Then fixing other common errors using a words.json file.

Thursday, 31 August 2017

Combining other services with IBM Watson Conversation Service

It is becoming increasingly popular to offer an interface to computer applications which resembles the way that we converse with a fellow human. The IBM Watson Conversation Service is an excellent way to program such an interface because it allows the developer an easy way to specify the conversation flow and is also very good at doing fuzzy matching on input text to guess what the user is really trying to find out. However, the graphical way that conversation flows are specified doesn't allow the user to make calls to external services in order to get information to be included in the reply.

People often need to call external services to get the information that their users are looking for and so in this article I describe a simple sample application written by myself and my colleague David Curran which shows a common pattern whereby the conversation service provides a template response along with parameters which can be used by the calling application to retrieve the necessary information to give the end user the answer that they are looking for.

This pattern is useful in a lot of different situations, but we will use a fictitious application of where people want to use a conversational interface to track their parcels. We will leverage the simple conversation application as a starting point to minimise the amount of work. You can either download that sample and follow the steps below to add the interface to the conversation agent, or if your prefer you can download the completed example from our GitHub repository.

Adding the Parcel Intent to the conversation

In order to modify the conversation agent to handle parcel requests, you first need to add a parcel intent to the list of intents. The original sample contains 25 intents which is the maximum allowed with the free plan, so you will need to delete one of the existing intents. I deleted the weather intent since it is not being used and then I added a parcel intent with a few sample inputs as you can see below,

The next step is to add a node to the dialog to specify how parcel queries are to be dealt with. Our logic is quite simple. If a number is detected in the input we assume that this is the parcel number so we set a context variable parcel_num with this value and then we send back a response message with placeholders where the parcel location should be inserted. However, if no number is detected in the input stream, we simply reply saying that they need to supply us with a parcel number. For simplicity sake we won't consider holding context from one question to the next.

Implementing the dummy parcel lookup service

We don't want to use a real parcel lookup service for this sample, because when testing we won't know the parcel number for parcels in transit. Instead we will implement a very simple lookup service.

To implement the parcel lookup service you need to add the following function near the end of app.js
 what this does is respond to get requests on /api/parcel and respond with one of the sample location names e.g. requesting http://localhost:3000/api/parcel?parcel_num=6 will return the string "Buckingham Palace". Just to illustrate how we should deal with errors, we have implemented the rule that if the parcel number is divisible by 13 it will return a status code 404 and an error message saying that the parcel number is unlucky.

 * A dummy parcel tracking service
 app.get('/api/parcel', function(req, res) {
   var parcel_num = parseInt(req.query.parcel_num);
   if (!req.query.parcel_num || isNaN(parcel_num)) {
     return res.status(400).end("Not a valid parcel number "
   if (0 == (parcel_num %13)) {
     return res.status(404).end("We can't find parcel number "
                                  +parcel_num+" it is unlucky!");

   var locations = [
     'Anfield', 'Stamford Bridge', 'Old Trafford', 'Parkhead',
     'Heathrow Airport', 'Westminister, London', 'Buckingham Palace',
     'Lands End, Cornwall', 'John O\'Groats'

   parcel_num = parcel_num % locations.length;
   var location  = locations[parcel_num];

You should experiment with this service and/or customise it before moving on to the next steps.

Recognising a parcel location request and filling in the details

The main code  modification we need to do is in the'/api/message',  function in app.js. However we first need to do some housekeeping changed due to the fact that we will be using the requestify library.

Add the following line to the dependencies section of package.json:
    "requestify": "^0.2.5",

Then add this line near the top of app.js
var requestify = require('requestify');

The nub of the code is contained in the function below. You should paste this into app.js to replace the call to conversation.message which is around line 56 of the original file.

  // Send the input to the conversation service
  conversation.message(payload, function(err, data) {
    if (err) {
      // the conversation service returned an error
      return res.status(err.code || 500).json(err);
    var parcel_num = data.context.parcel_num;
    if (data.intents && (data.intents.length>0) && data.intents[0].intent
                  && (data.intents[0].intent === 'parcel') && parcel_num) {
      var server = 'localhost';
      var port = process.env.PORT || process.env.VCAP_APP_PORT || 3000;
      var url = 'http://' + server + ':' + port +'/api/parcel?parcel_num='+parcel_num;
        .then(function(response) {
          var location = response.body;
          data.output.text[0] = data.output.text[0].replace( /\{0\}/g, location);
          return res.json(data);
          data.output.text[0] = "Parcel lookup service returned an error: "+err.body;
          return res.json(data);
    } else {
      return res.json(data);

The original code did nothing other than calling the updateMessage function before passing the data received from the Conversation service back to the UI layer. However, the updateMessage function didn't do anything useful so we can delete it and instead we will call our dummy parcel location service to find the location of the parcel whose number appears in the context variable.

If the http call succeeds we assume that we have a good location and we replace any placeholder {0} strings in the message received from the conversation service with this location. If we get an error status from the http call, we replace the entire string received from the conversation service with a message saying we failed to locate the parcel.


You have a conversation service which can reply to questions such as "where is my parcel number 543210" with details of where the parcel is located. It is currently only using a toy implementation which pseudo-randomly selects locations in the UK. However, it should be relatively easy to extend it to any real parcel tracing service you want. In fact, the methods used can easily be applied to interfacing with any 3rd party service.

You can go to to download the complete working sample.

Sending an Email from Watson Conversation Service

Sending emails is a simple way to connect a chatbot to a business process. This post shows how to extend the Watson Conversation Sample Application to get it to send email from a Gmail account. We will use the nodemailer library and a slightly modified version of the code from w3 schools to help us.

The first step is to download the original sample application from GitHub. Follow the instructions in the Readme file in the repository to get this application running on your local machine and/or on the BlueMix service. Make sure you have the original application working correctly before you go on to make any changes.

Modify the code below to reference the username and password of an Gmail account you have access to (or create a new account) and then add it near the top of the app.js file.

var nodemailer = require ('nodemailer');

var transporter = nodemailer.createTransport({  
  service: 'Gmail',  
  auth: {  
   user: '',  
   pass: 'secretpassword'  
 var mailOptions = {  
  from: '',  
  to: '',  
  subject: 'Sending Email from the chatbot',  
  text: 'That was easy!'  

If you use a gmail email and two factor authentication in Gmail you have to get an app password for your gmail address. Google are worried you will share your Gmail password with people so they give you a special password just for your app that has limited powers.

Next update the Watson application's package.json to say that you want to install the nodemailer package and force the server to use a version of node.js that supports nodemailer i.e.:

  "dependencies": {
  "nodemailer": ">0.3.x"
  "engines": {
    "node": ">= 6.9.x",
    "npm": "> 5.30.x"

Having done this, the next step is to change the conversation flow in the Watson Conversation Service so that when the user says ‘send an email’, we will set a context variable called 'email' to say an email should be set. (A production version would probably send different emails to different addresses depending upon context, but we will stick to a simple example for now)

If your WCS has an intent that you want to send an email. This contains example user utterances like

Can you email for me?
email dave important info
Ping dave an email to tell him to do important business things
Send an email to find my package
Send an email reminder message
Message Dave a mail to tell him the stuff he needs to do

In WCS dialog tab we want a node that recognises this intent and sets a context variable to signal to the node application to send an email.

In the json editor of the dialog node add a context variable for the email.

  "context": {
    "email": "TRUE"
  "output": {
    "text": {
      "values": [
        "Ill email david now"
      "selection_policy": "sequential"

Next, you need to change app.js to say that when this ‘send an email' context variable is send it. At the start of the function updateMessage add in the following code check to see if the context variable email from WCS' response has been set and send an email if it has.

if({;//set send variable to null so we don't send email every time
  transporter.sendMail(mailOptions, function(error, info){

    if (error) {
    } else {
      console.log('Email sent: ' + info.response);


You can download the complete sample from here

This code just sends the same email every time. In practice you would probably have to include some information from the conversation in the mail. Usually this would something from context variables. You would add in the email information the user had told the chatbot such as the value in response.context.query

I am not a node developer and this code needs some error correction and security checks added to it. But for a quick demo it should allow you to show your chatbot emailing.

Welcome to this Blog

IBM Watson services are amazing. Most people who saw Watson winning the Jeopardy game on television thought  that they would love to have that advanced technology applied to their problems. Unfortunately advanced technology can sometimes be complex and people struggle to see how they can adapt the IBM Watson services to their needs. As a result we decided to establish this blog which will focus on tips and tricks to help you get the most out the IBM Watson services.

It is not an official blog so there is no guarantee that the tips will continue to work. Treat the advice with caution and use the tips at your own risk.