By Adam Z. Lein | May 9, 2012 7:51 AM
For many years, most mobile phones came with some sort of hands-free voice dialing interface. Today there are many popular options on smartphones including Apple’s very often advertised and fun to use Siri software. Android has numerous options including Google’s Voice Actions, and Microsoft has a “TellMe” based speech interface as well. However, those are all just programs that process your voice and then carry out some action. They really just have a few tie-ins to a few other built-in apps and a networked server. You can do web searches or launch third party apps, but that’s where the integration ends.
Ever since Microsoft first released its Voice Command software for their smartphone operating system (Pocket PC Phone Edition) in 2003 (which was way ahead of its time with many functions still not found on smartphones today), users and developers have been wishing and hoping that Microsoft would make a speech application platform. Something that not only allowed you to interact with specific built-in functions, but was also extensible to third party applications just like a real operating system’s graphical user interface.
Ideally, a speech UI would give you one single method of activating it and then all functions supported would be accessible through the user’s speech and the device’s voice feedback. There shouldn’t be any need to look at a screen or press a button multiple times or even navigate to a different part of the visual interface in order to activate voice dictation or whatnot. Furthermore a real speech UI should allow 3rd party applications to extend their functions to other programs through a series of APIs. So, for example, a Twitter app could add support for specific voice commands related to Twitter functions, or GPS navigation apps could add commands for finding nearby destinations and asking about traffic updates, or eBook readers could add support for finding different books and reading them aloud. The possibilities are endless.
It’s been 10 years and there hasn’t been anything like this on any kind of consumer computing experience, why should we think Windows Phone 8 might bring this kind of innovation that we’ve been waiting so long for? Well, actually Microsoft did bring a sort of extensible speech interface to something else recently. The newest Xbox 360 dashboard released in the winter of 2011 seems to have a consistent speech interface that is shared by all applications and games that have been programmed to make use of it. No matter if I’m in the dashboard, the YouTube app, Netflix, Crackle, MSNBC or Star Wars Kinect if I say “Xbox” it will start listening to me and highlight the relevant commands that I can say. I can tell the TV what I want it to do instead of picking up a plastic controller and pressing buttons. Plus since the speech UI is shared throughout the system, apps like YouTube or Netflix don’t have to reinvent the wheel in order to get proper voice control support. Unfortunately, the Xbox’s speech interface still requires your eyes on the screen in many situations since it does not provide voice feedback for any commands (however it does have a little sound effect that acknowledges commands.)
Hopefully we will finally see some real forward movement in the area of speech UI’s for Windows Phone 8 and Windows 8 since that seems like the next step for human-computer-interactions of the future (considering we already have 3D gesture recognition with Kinect.)
Is an easy and extensible speech interface important to you for the future of smartphones or is the regular “touch a button” interface always going to be the primary interface?