Friday, 30 March 2012

SAPI: Output speech using the SAPI TTS engine in .NET

Today I'm posting about something which is possibly already covered elsewhere, but I'm covering it as part of a 'theme' that might appear in some of my postings, to do with making things accessible. If any of you immediately went 'ugh' and pulled a face, I hope you're not intending to work in software. Ever. Accessability isn't particularly hard and like real life, making things accessible in software requires a bit of thought and minimal time and in the case of software, probably no cost. But I've digressed already, maybe I'll have an accessability rant at some point.

Intro to SAPI

Ok, so what's this SAPI thinggy mibobby, you might be asking? Good question. SAPI is Microsoft's Spheech API. It handles both speech input and output (I intend to cover input later, speech recognition anyone?). In terms of output, SAPI has a TTS engine that uses a speech synthesiser to read text out in a synthesised voice. By default, this will come out of the computer's default audio device (though you have some control over this). As an aside, for anyone who's never come across a screen reader before, this is the same way as a screen reader will present elements on screen to its user.

A basic code example

Nothing too fancy here. Taken from a WPF project that has a label, a textbox and a button that triggers the only event in this code (readButton_Click). This also requires adding a reference to .NET's System.Speech library.

using System;
using System.Windows;
using System.Speech.Synthesis;

namespace SAPI_Synthe_Reader
/// Interaction logic for MainWindow.xaml

public partial class MainWindow : Window
SpeechSynthesizer synth;

public MainWindow()
synth = new SpeechSynthesizer();

private void readButton_Click(object sender, RoutedEventArgs e)
catch (Exception ex)
MessageBox.Show("Stuff went wrong!\n\n" + ex.Message, "Oops!", MessageBoxButton.OK, MessageBoxImage.Error);

Told you it was nothing too fancy. As far as the actual method that does the work, there is a syncronous version of the method called Speak, but that will take over your program's thread of execution until speech is finished. In other words, things will look like they're going slow, though this may sometimes be necessary, it depends on your uses for it. There is interesting stuff you can do, but I won't go into it here, not for now anyway. I got sidetracked trying to make it sing like Kesha, don't ask why. But some of the properties of the SpeechSynthesiser object should help you along.

But what's the point?

Some possible uses of this include:

  • Telephone based systems where something to read to the user is desirable

  • Improving accessability by possibly having SAPI read out things that wouldn't be apparent without vision when they happen on screen

  • Making audio games (if they're your gig). Try seeing to learn more. Please note that there are better ways to do this which I hope to go into later.

Those are just a few off the cuff examples. I'm sure there's more.

Well, now you can make your computer talk. Don't worry, we're still a long way from programming terminators here. For now...

Share and enjoy.

No comments:

Post a Comment

As always, feel free to leave a comment.