Proceed to WirelessDevNet Home Page
Publications, e-books, and more! Community Tutorials Store Downloads, tools, & Freebies! IT Career Center News Home
newnav.gif

Newsletters
EMail Address:



   Content
  - Articles
  - Columns
  - Training
  - Library
  - Glossary
 
   Career Center
  - Career Center Home
  - View Jobs
  - Post A Job
  - Resumes/CVs
  - Resource Center
 
   Marketplace
  - Marketplace Home
  - Software Products
  - Wireless Market Data
  - Technical Books
 
   News
  - Daily News
  - Submit News
  - Events Calendar
  - Unsubscribe
  - Delivery Options
 
   Community
  - Discussion Boards
  - Mailing List
  - Mailing List Archives
 
   About Us
  - About WirelessDevNet
  - Wireless Source Disks
  - Partners
  - About MindSites Group
  - Advertising Information
 

Ask The Expert

Topics: VoiceXML and Voice Applications

by Jeff Kunins, Manager of Developer Products and Evangelism, Tellme Networks.

About Our Expert


I'm a Web designer, and want to get started building voice applications. What are the issues I need to be thinking about?

Designing voice interfaces is different, and harder, than the Web or traditional touch-tone IVR applications. Dialogue as an interface to information and services presents many unique design challenges that demand deep expertise and ongoing refinement. While even the worst visual interfaces are at least somewhat usable, all but the best voice interfaces deeply frustrate and confuse callers. Traditional touch-tone IVR applications can be maddening, and while speech recognition makes it possible to produce outstanding interfaces that quickly and efficiently deliver self-service access to information and services, achieving this is still a complex art that demands specialized expertise and a deep commitment to quality.

The key design constraints of the voice medium are:

  • Serial Interface: Conversations take place linearly in time, and people can visually scan a full page of options much faster than listening to it being read aloud. Information and options have to be presented in sequence, paying close attention to people’s extremely low patience for not finding what they want quickly.
  • Short Term Memory: People have a limited ability to keep track of new lists of information, such as navigation options in a voice application, when encountering them for the first time. Unlike Web sites, persistent reminders like menu bars do not exist on the phone. Voice interfaces must strike a careful balance between providing extensive functionality and ensuring that callers do not get lost.
  • Imperfect Recognition: Speech recognition has made dramatic advances in the past 5 years, allowing accuracy rates to approach 98% and higher. Achieving this requires applications to specify the permissible vocabulary for every interaction in the conversation. When a caller says something that is not expected, the system only knows that it was "something else", rather than the exact phrase. This challenge forces voice application designers to very carefully construct the exact wording and sequence of prompts and grammars, so that callers are seamlessly and naturally guided through the conversation.
  • Web or traditional PC software designers can definitely apply their skills toward becoming facile with working in the audio medium. The fundamentals of user-centered design, iterative evolution of interfaces through statistically relevant usability testing, and attention to the balance of simplicity versus functionality all directly apply. The devil, as they say, is in the details. Gaining proficiency in voice interface design is best done by spending lots of time studying leading voice applications like 1-800-555-TELL, finding other designers with experience in this field to collaborate with, and getting your hands dirty by building and testing real applications with real customers.

Is it possible to build robust, fully-functional voice applications using just VoiceXML?

Absolutely. Arguably the most sophisticated voice application ever built, 1-800-555-TELL is completely implemented using VoiceXML. VoiceXML is an emerging open standard that brings the Web development paradigm to the phone, which means that existing HTTP gateways to enterprise services and data built using technologies like SSL and cookies can be seamlessly extended with voice. Your voice application is literally a new set of "pages" on your Web site that happen to describe a conversation rather than a visual interface. This means that you can choose to completely outsource the complexity and expense of building and managing your own speech recognition and telephony infrastructure, while 100% of your application logic and data remain under your control at your facility, all without any specialized connectivity to your outsourcing provider.

The global impact and ubiquitous penetration of the Web was predominantly driven by the simplicity of the open HTML standard. The Web development paradigm brought vendor and network independence to distributed applications, and drastically reduced the cost and skills required to quickly deliver powerful solutions. The essence of this model is that powerful Web servers execute application logic (written on any platform from any vendor in any language) while connecting locally to enterprise data, ultimately delivering simple HTML markup over HTTP to a thin browser client, which renders an experience to the end-user.

Early critics of the Web complained about HTML's lack of sophistication when compared to traditional PC applications, and browser vendors made enormous investments in proprietary architectures to embed C++ or Java-based "objects" within HTML pages. These efforts failed, losing to the simplicity of the pure Web development paradigm. Any shortcomings in HTML were either quickly addressed as the standard evolved, or were abandoned as unimportant relative to the benefits of the Web. This history will repeat itself with VoiceXML, and a combination of server-based component libraries and rampant source code reuse will again prove to be the most effective drivers of innovation and best practices.

When should I break my VoiceXML application into multiple documents, versus keeping it all in one?

If your VoiceXML application exceeds 1000 lines and is still growing, it may be time for you to break it down into modules.

The question of whether you should split a VoiceXML application into multiple files hinges on the size of the application. There are advantages to splitting up an application, but these can be outweighed by the costs. For a small application, it is best to keep the number of modules to a minimum. For mid to large-scale applications, the extra costs can be mitigated.

Clearly, you should have some idea about how the application might grow over time in order to make this call. What appears to be a simple application at first might later turn out to be much complicated. It's easier for you to make the decision to go with multiple modules at the outset of a project than to break up an application in mid life cycle. Unless you are certain that the application will remain relatively small, it is best to consider modularization as part of your development process. There's no hard and fast metric to deciding what is a small application, but if your VoiceXML code goes beyond 1000 lines, it's fair to say you no longer have a small application.

To modularize an existing application, begin by breaking out any embedded JavaScript into separate .js modules and modularizing the code using JavaScript functions. You can eliminate the problems created by different tags employing the same JavaScript by creating a JavaScript function and having the tags call the function instead. The following example demonstrates how to do this.

<noinput> <!-- noinput means move to next item in the list --> <script type="text/javascript"> <![CDATA[ onListNextItem(); ]]> </script> </noinput> <filled> <result name="nextitem"> <script type="text/javascript"> <![CDATA[ onListNextItem(); ]]> </script> </result> </filled>

You can either declare the function inside a <script> element in the VoiceXML module itself or you can create an external JavaScript module and declare the function there. The former approach has a quicker load time since the platform need only load the VoiceXML module. The second approach requires the platform to make two HTTP file requests, one for the VoiceXML module and one for the JavaScript module. If you never intend on having more than one VoiceXML module, you might consider embedding the function in the VoiceXML module.

If you have more than one VoiceXML module, you may want to break out your JavaScript into a separate module. A function declared inside one VoiceXML module is not visible by JavaScript in another VoiceXML module. Chances are, once you have VoiceXML living in multiple files, it will call common JavaScript. The only way to make this work is to have your JavaScript in an external module.


About Our Expert: Jeff Kunins is the Manager for Developer Products and Evangelism at Tellme Networks, Inc., where among other things he's responsible for managing Tellme Studio, the premier toolset and open community resource for VoiceXML developers.

Sponsors

Search

Eliminate irrelevant hits with our industry-specific search engine!









Wireless Developer Network - A MindSites Group Trade Community
Copyright© 2000-2010 MindSites Group / Privacy Policy
Send Comments to:
feedback@wirelessdevnet.com