Ask The Expert
Topics: VoiceXML and Voice Applications
by Jeff Kunins, Manager of Developer Products and Evangelism, Tellme Networks.
About Our Expert
I'm a Web designer, and want to get started building voice applications. What are the issues I need to be thinking about?
Designing voice interfaces is different, and harder, than the Web or traditional touch-tone IVR applications. Dialogue as an
interface to information and services presents many unique design challenges that demand deep expertise and ongoing refinement.
While even the worst visual interfaces are at least somewhat usable, all but the best voice interfaces deeply frustrate and
confuse callers. Traditional touch-tone IVR applications can be maddening, and while speech recognition makes it possible
to produce outstanding interfaces that quickly and efficiently deliver self-service access to information and services,
achieving this is still a complex art that demands specialized expertise and a deep commitment to quality.
The key design constraints of the voice medium are:
- Serial Interface: Conversations take place linearly in time, and people can visually scan a full page of
options much faster than listening to it being read aloud. Information and options have to be presented in sequence,
paying close attention to people’s extremely low patience for not finding what they want quickly.
- Short Term Memory: People have a limited ability to keep track of new lists of information, such as navigation
options in a voice application, when encountering them for the first time. Unlike Web sites, persistent reminders like menu
bars do not exist on the phone. Voice interfaces must strike a careful balance between providing extensive functionality and
ensuring that callers do not get lost.
- Imperfect Recognition: Speech recognition has made dramatic advances in the past 5 years, allowing accuracy rates
to approach 98% and higher. Achieving this requires applications to specify the permissible vocabulary for every interaction
in the conversation. When a caller says something that is not expected, the system only knows that it was "something else", rather than
the exact phrase. This challenge forces voice application designers to very carefully construct the exact wording and sequence
of prompts and grammars, so that callers are seamlessly and naturally guided through the conversation.
- Web or traditional PC software designers can definitely apply their skills toward becoming facile with working in the audio medium.
The fundamentals of user-centered design, iterative evolution of interfaces through statistically relevant usability testing,
and attention to the balance of simplicity versus functionality all directly apply. The devil, as they say, is in the details.
Gaining proficiency in voice interface design is best done by spending lots of time studying leading voice applications like
1-800-555-TELL, finding other designers with experience in this field to collaborate with, and getting your hands dirty by
building and testing real applications with real customers.
Is it possible to build robust, fully-functional voice applications using just VoiceXML?
Absolutely. Arguably the most sophisticated voice application ever built, 1-800-555-TELL is completely implemented using VoiceXML.
VoiceXML is an emerging open standard that brings the Web development paradigm to the phone, which means that existing HTTP gateways
to enterprise services and data built using technologies like SSL and cookies can be seamlessly extended with voice. Your voice
application is literally a new set of "pages" on your Web site that happen to describe a conversation rather than a visual
interface. This means that you can choose to completely outsource the complexity and expense of building and managing your
own speech recognition and telephony infrastructure, while 100% of your application logic and data remain under your control
at your facility, all without any specialized connectivity to your outsourcing provider.
The global impact and ubiquitous penetration of the Web was predominantly driven by the simplicity of the open HTML standard.
The Web development paradigm brought vendor and network independence to distributed applications, and drastically reduced the
cost and skills required to quickly deliver powerful solutions. The essence of this model is that powerful Web servers execute
application logic (written on any platform from any vendor in any language) while connecting locally to enterprise data,
ultimately delivering simple HTML markup over HTTP to a thin browser client, which renders an experience to the end-user.
Early critics of the Web complained about HTML's lack of sophistication when compared to traditional PC applications, and
browser vendors made enormous investments in proprietary architectures to embed C++ or Java-based "objects" within HTML pages.
These efforts failed, losing to the simplicity of the pure Web development paradigm. Any shortcomings in HTML were either
quickly addressed as the standard evolved, or were abandoned as unimportant relative to the benefits of the Web. This history
will repeat itself with VoiceXML, and a combination of server-based component libraries and rampant source code reuse will
again prove to be the most effective drivers of innovation and best practices.
When should I break my VoiceXML application into multiple documents, versus keeping it all in one?
If your VoiceXML application exceeds 1000 lines and is still growing, it may be time for you to break it down into modules.
The question of whether you should split a VoiceXML application into multiple files hinges on the size of the application.
There are advantages to splitting up an application, but these can be outweighed by the costs. For a small application,
it is best to keep the number of modules to a minimum. For mid to large-scale applications, the extra costs can be mitigated.
Clearly, you should have some idea about how the application might grow over time in order to make this call. What appears
to be a simple application at first might later turn out to be much complicated. It's easier for you to make the decision
to go with multiple modules at the outset of a project than to break up an application in mid life cycle. Unless you are
certain that the application will remain relatively small, it is best to consider modularization as part of your development
process. There's no hard and fast metric to deciding what is a small application, but if your VoiceXML code goes beyond 1000
lines, it's fair to say you no longer have a small application.
To modularize an existing application, begin by breaking out any embedded JavaScript into separate .js modules and
modularizing the code using JavaScript functions. You can eliminate the problems created by different tags employing
the same JavaScript by creating a JavaScript function and having the tags call the function instead. The following example demonstrates how to do this.
You can either declare the function inside a <script> element in the VoiceXML module itself or you can create an
external JavaScript module and declare the function there. The former approach has a quicker load time since the platform
need only load the VoiceXML module. The second approach requires the platform to make two HTTP file requests, one for the
VoiceXML module and one for the JavaScript module. If you never intend on having more than one VoiceXML module, you might
consider embedding the function in the VoiceXML module.
If you have more than one VoiceXML module, you may want to break out your JavaScript into a separate module. A function
declared inside one VoiceXML module is not visible by JavaScript in another VoiceXML module. Chances are, once you have
VoiceXML living in multiple files, it will call common JavaScript. The only way to make this work is to have your JavaScript
in an external module.
About Our Expert: Jeff Kunins is the Manager for Developer Products and Evangelism at Tellme Networks, Inc.,
where among other things he's responsible for managing Tellme Studio, the premier toolset and open
community resource for VoiceXML developers.