Wiki

Clone wiki

webdialog / Get Started

Getting the code

Either run:

git clone https://bitbucket.org/matthen/webdialog.git

or download the code in a zip file.

SSL

First you need to create a public and private key for SSL. (Using SSL means you don't need to keep asking for permission to use the microphone)

mkdir ssl
openssl genrsa -des3 -out ssl/server.key 1024
openssl req -new -key ssl/server.key -out ssl/server.csr
openssl x509 -req -days 365 -in ssl/server.csr -signkey ssl/server.key -out ssl/server.crt

Requirements

Make sure you have web.py installed:

sudo pip install web.py

You should now be able to run:

python server.py

After entering the password for your encryption, it will wait to serve on port 8080. Visit https://localhost:8080/dialog to see the simple demo webdialog ships with. Dialog logs will start being saved in the logs directory.

Extending the demo

In writing the code, I have tried not to force too much upon developers who decide to use it. Here is a small step-by-step tutorial which should make it clear how you can go on to extend the simple demo to a real system. The aim will be to write a system which tracks the most recognised word in the ASR, and outputs that to the browser.

First make a copy of DialogState.py

cp DialogState.py DialogStateTutorial.py

At the top of DialogStateTutorial.py, add

from collections import defaultdict
import operator

And under python def __init__( in the dialogState class, add:

        # word counts
        self.word_counts = defaultdict(int)

This creates a defaultdict to store the word counts in the dialogState. Note one of these objects gets initialised for each dialog. The update(self, asr_result) function is what gets passed new ASR results at each turn. The asr_result argument it is passed is an object like this:

{
   "confidence":0.8165613412857056,
   "hyps":[
      "can you lend me a 12000 dollars",
      "can you let me a 12000 dollars",
      "can you read me a 12000 dollars",
      "can you lend me a twelve thousand dollars",
      "can you let the twelve thousand dollars",
      "can you lead me a 12000 dollars",
      "can you let me 12000 dollars",
      "can you lend me 12000 dollars",
      "can you buy me a 12000 dollars"
   ]
}

Here is an update function which will track the top recognised word:

    def update(self, asr_result):
        self.asr_results.append(asr_result)
        for hyp in asr_result["hyps"]:
            for word in hyp.split():
                self.word_counts[word] += 1
        most_frequent_word = max(self.word_counts.iteritems(), key=operator.itemgetter(1))[0]
        response = {
            "tts": "You have said '" + most_frequent_word + "' the most.",
            "ended": False,
            "most_frequent_word":most_frequent_word
            }
        self.responses.append(response)
        return response

Note the response now includes a new property, most_frequent_word, which we will access in the browser. In order to use this new class, create a file called config.cfg in the top directory containing:

[webdialog]
; your python class for updating dialog state
dialog_state_class = DialogStateTutorial.dialogState

Now run the server (python server.py) and visit https://localhost:8080/dialog to try the new system.

Lastly I want to demonstrate how to access the other response data that is sent from the DialogState object. In real applications this could be a new list of coordinates for a map, a list of search results, a URL to an image, etc...

In templates/display.html add the following:

<p>Most frequent word: <span id="most_frequent_word"></span></p>

This template gets included into a div with id display on the main page, under the control box which by default shows the live top ASR hypothesis, and the system response.

And in static/js/views.js we will add some javascript which listens for events on the window.dialog object and updates the web page. Add the following before the final }); in views.js:

    // Most Frequent Word
    if ($('#most_frequent_word').length != 0) {
        window.dialog.on("response", function(event, response) {
            var $most_frequent_word = $('#most_frequent_word');
            if (response.hasOwnProperty("most_frequent_word")) {
                $most_frequent_word.text(response.most_frequent_word);
            }
        });
        $('#error_text').hide();    
    }

This code (written using jQuery) checks the most_frequent_word span exists in the document, and then attaches a listener to the "response" event. The listener updates the text of the span with the most_frequent_word attribute of the response. Note this corresponds exactly with the JSON returned by the DialogState object's update function.

Lastly add the following CSS to static/css/display.css:

#display span#most_frequent_word {
    font-weight:bold;
}

Now rerun the server and check it all works.

This has shown how webdialog implements a dialog system as a python object, and how to get started visualising the responses in the browser.

Note that this tutorial required editing only the files display.html, display.css, views.js and config.cfg. These are not tracked in the git repository, so by editing these files alone you should be able to do git pull without getting any conflicts.

Updated