HTTPS SSH

py_google_asr -- Use Google Voice Recognition to transcribe audio data

Author: Alok Parlikar <aup@cs.cmu.edu>

Introduction

Google has a speech recognition webservice. This is used by Android for voice-input and also used by Chrome. At the time of writing this README, the API of speech recognition wasn't made public. However, the chromium source code contains details about how to use the API.

See also:

Requirements

  • Python 3
  • sox package (We need /usr/bin/sox to convert wave into flac

Installation

$ python3 setup.py install

should install the google_asr module into your python setup

Usage

From your Python3 Code

import google_asr wavefilename = "/tmp/short_speech.wav" results = decode_wavefile(wavefilename, max_results=2, profanity_filter=True)

for r in results:
print(r['utterance'])

From the CommandLine

If you have wave files in a directory called wav-dir/ then, to transcribe them all, you can run:

$ ls -1 wav-dir/*.wav | google_asr_batch_transcribe.py > transcriptions

The transcriptions file will contain three tab separated columns: <filename> <transcription> <confidence score>

PocketSphinx-like server mode

Pocketsphinx (http://cmusphinx.sourceforge.net/) is a small-footprint Speech recognizer developed at CMU. It can run in server mode and perform online recognition. If you have a client that interacts with the pocketsphinx server, but would like to use it with Google ASR, you can run Google ASR in server-mode, after you have installed this package:

$ google_asr_server.py --bind-address 127.0.0.1 --bind-port 9993