Software Architect / Developer / Microsoft MVP (AI)

Bot Framework, Chatbots, Cognitive Services, Speech

Adding speech capability to your chatbot using Bot Framework and Azure Speech Services

Voice is one of the most natural ways for humans to communicate. Over the last few years advances in natural language processing and human to machine comprehension have meant that it’s becoming easier than ever to build innovative voice first solutions.

In this blog post we’ll implement a chatbot using the Microsoft Bot Framework. The chatbot will be housed within the webchat component, accept voice commands, and speak back to you.

Required Resources

Implementing a voice first chatbot requires the following services and settings to be created in Azure:

  • Azure Speech Service
  • Bot Registration Channel
  • Direct Line Channel

The Azure Speech Service will give our bot Text to Speech and Speech to Text capabilities.

The Bot Registration Channel is the endpoint in Azure where our deployed bot is hosted (and let’s configure Channels to connect to).

The Direct Line Channel is the glue between our client (a web page in our example) that let’s us connect to our bot hosted in Azure.

Azure Speech Services

The first service to create is the Speech API. You can find this in the Azure Marketplace:

You create this like other APIs in Azure. After you’ve created the API take a note of the Endpoint. It’ll look like the following: https://yourregion.api.cognitive.microsoft.com/sts/v1.0/issuetoken.

You’ll also need the API key:

With a note of the Speech Endpoint and Key the next thing we can do is to create the Bot Registration Channel.

Bot Channel Registration

The Bot Registration Channel is the endpoint in Azure where we will deploy our bot to. Client applications can then connect to endpoints (or Channels) exposed by the Bot Channel Registration. Search for this in Azure and click Create.

Most of the fields in the creation screen for the Bot Channel Registration are standard.  There are two fields that we will come back to later:

  • Messaging Endpoint
  • Microsoft App Id and Password

Click Create and the Bot Registration Channel will be added. At this point, we’ve created the Speech API and the Bot Channel Registration, but we don’t yet have a chatbot to test so let’s implement that.

For quickness I’ve used the EchoBot example. This is a simple bot that greet you then repeat back whatever you type in.

You can see from the code that it doesn’t do very much:

public class EchoBot : ActivityHandler

{

protected override async Task OnMessageActivityAsync(ITurnContext<IMessageActivity> turnContext, CancellationToken cancellationToken)

{

var replyText = $"Echo: {turnContext.Activity.Text}";

await turnContext.SendActivityAsync(MessageFactory.Text(replyText, replyText), cancellationToken);

}

protected override async Task OnMembersAddedAsync(IList<ChannelAccount> membersAdded, ITurnContext<IConversationUpdateActivity> turnContext, CancellationToken cancellationToken)

{

var welcomeText = "Hello and welcome!";

foreach (var member in membersAdded)

{

if (member.Id != turnContext.Activity.Recipient.Id)

{

await turnContext.SendActivityAsync(MessageFactory.Text(welcomeText, welcomeText), cancellationToken);

}

}

}

}

You can download the EchoBot example here. We can test this bot out locally using the Bot Framework Emulator and see this in action here:

This bot must be published to Azure. After you’ve done that (Right Click->Publish!), take a note of the Site URL:

Our bot is now hosted in Azure but it’s good to test this out before going forward.  We can do this by going back into the Azure Portal and searching for the Bot Channel Registration. Before we test our bot in the Azure Portal however, we need to set the Messaging Endpoint.  Remember that before? We didn’t initially have that as we hadn’t created and published our bot, so got back into the Bot Channel Registration screen and click on Settings and input in the Site URL from the previous screen shot:

Notice the suffix “/api/messages” is also added.  We also need to grab the Microsoft App Id and Client Secret Key.

  1. Take a note of the Microsoft App Id.
  2. Click on the blue link that says Manage

You’ll be taken to the Certifications and Secrets page. Select New client secret. When you do this, the secret will be displayed only once. Again, take a note of this.

Armed with the Microsoft App Id and Client Secret we can almost test our bot in Azure. The final thing we need to do is update the EchoBot project in Visual Studio. To help our bot talk to Azure we need to set theses values in the appsettings.json file.  We can see this here:

After you’ve set the Microsoft App Id and MicrosoftAppPassword (Client Secret) republish the bot.

Finally, we can test in the Azure Portal. To do this browse to the Bot Channel Registration and click on Test in Web Chat. The bot will greet you. You can type in something and it’ll get sent back to you:

Direct Line

We will be consuming the Speech API over the Direct Line Channel and using this within a Webchat control. To do this we have to add Direct Line to the Bot Channel Registration. When you do this, take a note of the Secret Key:

Now that we have the Direct Line secret key, we use this to create an instance of the webchat component in an ASP.NET web application.

ASP.NET Web Application

Remember the Speech API and Direct Line secret keys earlier? We need to use these here. The following code does a few things:

  1. References the bot framework webchat.js from a CDN
  2. With our Speech API Key, a POST request is made to the issueToken endpoint to fetch a Speech Services Token
  3. We use the Token to create a Web Speech Object (webSpeechPonyFillFactory) which provides text-to-speech and speec-to-text capabilities
  4. We create an instance of the Web Chat control (window.WebChat.renderWebChat) using the Direct Line. The constructor here accepts our Direct Line Secret and Web Speech Object. This is the bound to the DIV with the id “webchat”.
<!DOCTYPE html>
<html lang="en-US">
<head>
<script src="https://cdn.botframework.com/botframework-webchat/latest/webchat.js"></script>
<style>
html, body {
height: 100%
}

body {
margin: 0
}


#webchat {
height: 100%;
width: 100%;
}
</style>
</head>
<body>
<div id="webchat" role="main"></div>

<script>
(async function () {

var speechServicesTokenRes = await fetch(

'https://uksouth.api.cognitive.microsoft.com/sts/v1.0/issueToken',

{ method: 'POST', headers: { 'Ocp-Apim-Subscription-Key': <your-speech-api-key>} }

);

if (speechServicesTokenRes.status === 200) {

authorizationToken = await speechServicesTokenRes.text();


var webSpeechPonyfillFactory = await window.WebChat.createCognitiveServicesSpeechServicesPonyfillFactory({

authorizationToken: authorizationToken,

region: <your-azure-region-eg-uksouth>

});

window.WebChat.renderWebChat({

directLine: createDirectLine({

secret: <your-bot-reg-channel-direct-line-secret>

}),

webSpeechPonyfillFactory: webSpeechPonyfillFactory

}, document.getElementById('webchat'));

}

})().catch(err => console.error(err));

</script>
</body>
</html>

Testing it out

At this point we have everything in play so can publish our bot. After out bot is published, we visit our web page that contains our script. Our page will contain the web chat and a microphone.  We can click on the microphone and speak to our bot. Our bot will listen to us, then repeat back what we have just said.

I’ve uploaded a video of this onto my YouTube Channel which you can see here. Alternatively, you can visit the sample page here.

Further Ideas

We’ve just touched the basics in this bot and it’s not highly intelligent!

You could augment the bot’s intelligence by introducing LUIS to help understand human language and hand off conversations to discrete dialogues.

You could also introduce additional Azure Cognitive Service to further enhance the chatbot capabilities such as introducing Text Analytics to

Resources

Here are some resources that were useful in building this solution:

Speech Service

https://docs.microsoft.com/en-gb/azure/cognitive-services/speech-service/

Webchat Overview

https://docs.microsoft.com/en-us/azure/bot-service/bot-builder-webchat-overview?view=azure-bot-service-4.0

Webchat Examples

https://github.com/microsoft/BotFramework-WebChat/tree/master/samples

Credit:

Finally, credit to Stack Overflow with help on the JS.

Summary

In this blog post we’ve seen how to build a simple chatbot using the Bot Framework SDK. We’ve surfaced our bot in a web page using the Direct Line Channel.

Using the Direct Line, we’ve configured Speech to Text and Text to Speech services that are delivered using the Azure Speech API.

JOIN MY EXCLUSIVE EMAIL LIST
Get the latest content and code from the blog posts!
I respect your privacy. No spam. Ever.

Leave a Reply