If you remember, I was getting started with Audio Processing in Python (thinking of implementing an audio classification system) a couple of weeks back (see my earlier post). I got the PyAudio package setup and was having some success with it. As you know, one of the more interesting areas in audio processing in machine learning is Speech Recognition. So, although it wasn't my original intention of the project, I thought of trying out some speech recognition code as well.
I searched around to see what Python packages are available for the task and found the SpeechRecognition package.
Python Speech Recognition running with Sphinx |
SpeechRecognition is a library for Speech Recognition (as the name suggests), which can work with many Speech Engines and APIs. The current version supports the following engines and APIs,
- CMU Sphinx
- Google Speech Recognition
- Google Cloud Speech API
- Wit.ai
- Microsoft Bing Voice Recognition
- Houndify API
- IBM Speech to Text
I decided to start with the Sphinx engine since it was the only one that worked offline. But keep in mind that Sphinx is not as accurate as something like Google Speech Recognition.
First, let's set up the SpeechRecognition package.