Sunday, October 30, 2011

Speech interfaces: UI revolution or intelligent evolution?

Speech interfaces have received a lot of attention recently, especially with the marketing blitz for Siri, the new speech interface for the iPhone.

After watching some of the TV commercials you might conclude that you can simply talk to your phone as if it were your friend, and it will figure out what you want. For example, in one scenario the actor asks the phone, “Do I need a raincoat?”, and the phone responds with weather information.

A colleague commented that if he wanted weather information he would just ask for it. As in “What is the weather going to be like in Seattle?” or “Is it going to rain in Seattle?”.

Without more conversational context, if a friend were to ask me, “Do I need a raincoat?”, I would probably respond, “I don’t know, do you?” — jokingly, of course.

Evo or revo?
Are we ready to converse
with our phones and cars?
Kidding aside, systems like Siri raise an important question: Are we about to see a paradigm shift in user interfaces?

Possibly. But I think it will be more of a UI evolution than a UI revolution. In other words, speech interfaces will play a bigger role in UI designs, but that doesn't mean you're about to start talking to your phone — or any other device — as if it’s your best friend.

Currently, speech interfaces are underutilized. The reasons for this aren't yet clear, though they seem to encompass both technical and user issues. Traditionally, speech recognition accuracy rates have been less than perfect. Poor user interface design (for instance, reprompting strategies) has contributed to the overall problem and to increased user frustration.

Also, people simply aren't used to speech interfaces. For example, many phones support voice-dialing, yet most people don't use this feature. And user interface designers seem reluctant to leverage speech interfaces, possibly because of the additional cost and complexity, lack of awareness, or some other reason.

"Relying heavily on speech can lead
to a suboptimal user experience..."

As a further complication, relying heavily on speech as an interface can lead to a suboptimal user experience. Speech interfaces pose some real challenges, including recognition accuracy rates, natural language understanding, error recovery dialogs, UI design, and testing. They aren't the flawless wonders that some marketers would lead you to believe.

Still, I believe there is a happy medium for leveraging speech interfaces as part of a multi-modal interface — one that uses speech as an interface where it makes sense. Some tasks are better suited for a speech interface, while others are not. For example, speech provides an ideal way to provide input to an application when you can capitalize on information stored in the user’s head. But it’s much less successful when dealing with large lists of unfamiliar items.

Talkin' to your ride
Other factors, besides Apple, are driving the growing role of speech interfaces — particularly in automotive. Speech interfaces can, for example, help address the issue of driver distraction. They allow drivers to keep their “eyes on the road and hands on the wheel,” to quote an oft-used phrase.

So, will we see a paradigm shift towards speech interfaces? It's unlikely. I'm hoping, though, that we'll see a UI evolution that makes better use of them.

Think of it more as a paradigm nudge than a paradigm shift.

Recommended reading

Situation Awareness: a Holistic Approach to the Driver Distraction Problem
Wideband Speech Communications for Automotive: the Good, the Bad, and the Ugly



  1. One user issue is about self-consciousness. I think most people feel silly talking to their phones, especially in public places.

  2. Personally, I'm waiting for the day when our various devices have to negotiate as to who will answer our questions. You ask, "Will it rain today?", and your car answers, "You talking to me, or to your watch?" :-)

    - Paul