iOS Cookbook
This is a collection of code snippets and brief descriptions designed to help you be as productive as possible as quickly as possible. Check out the Concepts section for more detailed discussions about the techniques mentioned here.
SpeechPipeline
Set up a default import Spokestack
...
// The default configuration uses Apple's ASR as both a
// wake word recognizer and speech recognizer
// `self` adopts the `SpeechEventListener` protocol
lazy public var pipeline: SpeechPipeline = {
return SpeechPipelineBuilder()
.setListener(self)
.useProfile(.appleWakewordAppleSpeech)
.build()
}()
...
func startListening() {
// Spokestack will start listening for its wake word
pipeline.start()
}
Tap to talk
// `pipeline` is a `SpeechPipeline` instance as before
func onTalkButtonPressed() {
// if the pipeline has been started elsewhere, you
// don't need this line
pipeline.start()
// skips the wake word activation and sends the pipeline
// straight to ASR
pipeline.activate()
}
Use a custom wake word
This example uses the same profile as the previous recipe, which is to say that Apple ASR is used as a wake word detector. This may or may not perform well for your specific wake word, but it should be suitable for demo purposes. Contact us for more information about developing a custom wake word for your app.
import Spokestack
...
lazy public var pipeline: SpeechPipeline = {
return SpeechPipelineBuilder()
.setListener(self)
.useProfile(.appleWakewordAppleSpeech)
.setProperty("wakewords", "custom,phrase")
.build()
}()
Recognize Wake Words On-Device
To use the demo “Spokestack” wake word, download the TensorFlow Lite models: detect | encode | filter
import Spokestack
// ...
// `self` adopts the `SpeechEventListener` protocol
// `*Path` variables are string paths to the models downloaded above
lazy public var pipeline = SpeechPipelineBuilder()
.setListener(self)
.setDelegateDispatchQueue(DispatchQueue.main)
.useProfile(.tfLiteWakewordAppleSpeech)
.setProperty("tracing", ".PERF")
.setProperty("detectModelPath", detectPath)
.setProperty("encodeModelPath", encodePath)
.setProperty("filterModelPath", filterPath)
.build()
)
Cancel ASR (before the timeout is reached)
// `pipeline` is a `SpeechPipeline` instance
func cancelAsr() {
pipeline.deactivate()
}
When deactivate
is called, Spokestack will continue listening for the next wake word activation. To stop listening entirely, call
pipeline.stop()
After calling this, you’ll need to call
pipeline.start()
before you’ll be able to recognize a wake word again.
If speech is being processed when deactivate
is called, it will still be delivered to your SpeechEventListener
’s didRecognize
method when processing is complete.
Regex-based NLU
Let’s say you’re creating a voice-controlled timer and wish to perform simplistic natural language processing to respond to a handful of commands: start, stop, reset, start over
. SpeechEventListener
’s’ didRecognize
might look something like this:
class MyViewController: UIViewController, SpeechEventListener {
// ...other SpeechEventListener functions...
func didRecognize(_ result: SpeechContext) {
let userText = result.transcript
if userText.range(of: "(?i)start",
options: .regularExpression) != nil {
// start the timer and change the UI accordingly
return
}
if userText.range(of: "(?i)stop",
options: .regularExpression) != nil {
// stop the timer and change the UI accordingly
return
}
if userText.range(of: "(?i)reset|start over",
options: .regularExpression) != nil {
// reset the timer and change the UI accordingly
return
}
}
}
NLUResult
Extracting an intent slot value from Sticking with the timer app example, here’s how to extract a slot value from an NLUResult
, like one delivered to NLUDelegate
’s classification
event. Note that the intent and slot names are pre-determined by the NLU model metadata.
class MyViewController: UIViewController, SpeechEventListener, NLUDelegate {
// ...other delegate functions...
func classification(result: NLUResult) {
switch result.intent {
// using the example of a timer
case "start":
// the "start" intent can have slots named "duration" and "units"
let duration = result.slots!["duration"]!.value as! Int
let units = result.slots!["units"]!.value
// start a timer for `duration` `units` (eg 60 seconds) and change the UI accordingly
return
}
}
}
AVPlayer
Play back synthesis result using your own class MyViewController: UIViewController, SpeechEventListener, TextToSpeechDelegate {
// with your properties
let player = AVPlayer()
...
func success(url: URL) {
let playerItem = AVPlayerItem(url: url)
player.replaceCurrentItem(with: playerItem)
player.play()
}
// implement the other functions of the TextToSpeechDelegate protocol...
Note that you’ll need a strong reference to the AVPlayer
; you can’t just create it inside your success
implementation.
At runtime, you’ll send your text to Spokestack:
// with your properties
// assumes `self` adopts `TextToSpeechDelegate`
// uses default SpeechConfiguration values for api access.
let tts = TextToSpeech(self, configuration: SpeechConfiguration())
...
func speak(_ text: String) {
let input = TextToSpeechInput(text)
tts.synthesize(input)
}
Related Resources
Want to dive deeper into the world of Android voice integration? We've got a lot to say on the subject: