Getting Started
This guide will get you up and running with Spokestack for Python, and you’ll have a voice interface in your application in no time.
Installation
System Dependencies
There are some system dependencies that need to be downloaded in order to install spokestack
via pip.
macOS
brew install lame portaudio
Debian/Ubuntu
sudo apt-get install portaudio19-dev libmp3lame-dev
Windows
We currently do not support Windows 10 natively, and recommend you install Windows Subsystem for Linux (WSL) with the Debian dependencies. However, if you would like to work on native Windows support, we gladly accept pull requests.
Another potential avenue for using Spokestack on Windows 10 is via anaconda. PortAudio can be installed via conda
, but Lame cannot. Hence, microphone input will be supported, but text-to-speech will not.
conda install portaudio
Installation with pip
Once system dependencies have been satisfied, you can install the library with the following.
pip install spokestack
Setup
We use pyenv
for virtual environments.
pyenv install 3.8.6
pyenv virtualenv 3.8.6 spokestack
pyenv local spokestack
pip install -r requirements.txt
Install Tensorflow
This library requires a way to run TFLite models. There are two ways to add this ability. The first is installing the full Tensorflow library:
pip install tensorflow
In use cases where you require a small footprint, such as on a Raspberry Pi or similar Internet of Things (IOT) devices, you will want to install the TFLite Interpreter. You can install it for your platform by following the instructions.
Integration
In order for your application to use Spokestack’s features, there are a few things you will need:
- A free Spokestack Account
- Audio Input Device
- A
SpeechPipeline
Instance - Audio Output Device
1. Spokestack Account
Go to spokestack.io to set up your own account (it’s free!). Once you’ve got that, go grab one of our free NLU models. We’ll use the Highlow
one in this example, but you can choose another, or create your own
Once you’ve downloaded your NLU, unzip nlu.tar.gz
with the three files inside (metadata.json
, nlu.tflite
, vocab.txt
). The location of the directory isn’t important, because we will pass the path on initialization.
2. Audio Input Device
The PyAudioInput
class will use the system default audio input device. Most personal computers have some form of microphone, but in the case of an embedded device, you may need to purchase a small USB microphone.
SpeechPipeline
Instance
3. Spokestack’s speech pipeline handles collecting audio from the input device and transcribing speech directed at your app. The SpeechPipeline
guide has a detailed explanation of how to set up the pipeline, so we will show the quickest way here using a profile, which configures the pipeline’s components for a specific use case. The profile we use here includes wake word activation and speech transcription using Spokestack’s cloud ASR.
from spokestack.profile.wakeword_asr import WakewordSpokestackASR
pipeline = WakewordSpokestackASR.create(
"spokestack_id", "spokestack_secret", model_dir="path_to_tflite_model_dir"
)
pipeline.start()
From text to meaning
Translating the text into an action is the job of the Natural Language Understanding (NLU) component. A great thing about Spokestack NLU models is that they run entirely on device. The NLU can be initialized like this:
from spokestack.nlu.tflite import TFLiteNLU
nlu = TFLiteNLU("path_to_tflite_model_dir")
Input to the NLU model is the ASR transcript. The transcript can be accessed as a property of SpeechContext
. Below is a sample event handler for running inference on the speech transcript.
@pipeline.event
def on_recognize(context)
results = nlu(context.transcript)
Some useful links for configuring Spokestack’s NLU:
Talking back to your users
If you want the full smart speaker experience, you will need to give your application a voice. This can be achieved with text-to-speech (TTS). For more information on TTS, see the TTS concept guide. TTS playback uses the PyAudioOutput
class, which plays audio with the default speaker for the device. Like NLU, TTS can be used in an event handler. Take a look at the example below, which simply speaks what the ASR heard.
@pipeline.event
def on_recognize(context):
tts.synthesize("welcome to spokestack")
Conclusion
That’s all there is to setting up an application with Spokestack. Your Python application can now accept and respond to voice commands.
Thank you for taking the time to read this!
Related Resources
Want to dive deeper into the world of Android voice integration? We've got a lot to say on the subject: