Text-to-speech for Python
Text-to-speech is a broad topic, but as far as Spokestack is concerned, there are two things your app has to handle: sending text, SSML, or Speech Markdown to be synthesized; and playing the resulting audio for your users. This guide will cover both.
Generating Audio
The best way to synthesize speech in Spokestack is to use the TextToSpeechManager
module. This module combines TextToSpeechClient
with an audio output target. Keep in mind that this module operates independently of the SpeechPipeline
. If you haven’t already, you will need to create an account or sign in to get your API credentials.
TextToSpeechManager
is initialized as follows:
from spokestack.tts.manager import TextToSpeechManager
from spokestack.tts.clients.spokestack import TextToSpeechClient
from spokestack.io.pyaudio import PyAudioOutput
manager = TextToSpeechManager(
TextToSpeechClient("spokestack_id", "spokestack_secret"), PyAudioOutput()
)
There are three different modes for TTS: text
, ssml
, markdown
. We will go over each mode briefly here. However, if you would like a more detailed view check out the TTS concept guide.
Text
The text
mode is for plain text without any additional markup. To synthesize plain text you do the following:
manager.synthesize(utterance="welcome to spokestack", mode="text", voice="demo-male")
SSML
SSML is based on XML and gives you enhanced control over pronunciation. Check out the guide for more details. You can synthesize speech from SSML like this:
manager.synthesize(
utterance="<speak>welcome to spokestack</speak>", mode="text", voice="demo-male"
)
Speech Markdown
Speech Markdown is a wrapper around SSML syntax that gives some additional features as explained in the guide. An example of Speech Markdown looks like this:
manager.synthesize(
utterance="See all our products at (www)[characters] dot my company dot com.",
mode="text",
voice="demo-male",
)
Additional Synthesis Options
If automatic playback is not what you are looking for, we offer another option. An instance of
TextToSpeechClient
can synthesize separately from the TextToSpeechManager
and produce a URL that points to the audio file. This allows you to download the entire audio clip. This is especially useful in a Jupyter notebook where you may not have direct audio output access. Using the TextToSpeechClient
to retrieve the audio URL is as simple as this:
from spokestack.tts.clients.spokestack import TextToSpeechClient
tts = TextToSpeechClient("spokestack_id" "spokestack_secret")
audio_location = tts.synthesize_url("welcome to spokestack")
Related Resources
Want to dive deeper into the world of Android voice integration? We've got a lot to say on the subject: