We heard recently that Microsoft’s new speech interface codenamed “Cortana” might make use of the same voice used in the Halo video game series’ Cortana character. The character in the games is voiced by Jen Taylor and basically she’s an artificial intelligence system that helps out the player at different points during the game’s story. Cortana is considered a “smart” artificial intelligence since her creative matrix is allowed to expand where as “dumb” AI systems retain a limited creative matrix. The Cortana interface for Windows Phone is expected to be available in beta come April this year, and it should come to the Xbox One and Windows 8 in 2015 along with iOS via the Bing app.
One of the reasons I used Windows Mobile as my preferred smartphone operating system back around the turn of the century was because its speech interface was so much more advanced than anything else. I could be on my motorcycle, press the Bluetooth button and ask my phone about my schedule, the time, or my battery life. I could make phone calls and reply to text messages too of course.
Today there are many more advanced smartphone speech interfaces. Apple’s Siri has seen the most mainstream mindshare due to its commercials and natural language recognition capabilities along with its often-humorous pre-programmed responses. Microsoft has said that they won’t release a Siri competitor until they can leapfrog it in functionality. So that’s what we’re expecting Cortana to be.
Third party app extensions
One of the best parts of the current Windows Phone 8 speech interface is the ability for 3rd party developers to extend its capabilities. After you install particular apps, the central speech interface (accessible by holding down the start button or pressing the voice command on a Bluetooth connected device) will “learn” a new series of commands as specified by the developer. For example, the MetroTalk app will add commands that you can use to make call or send text messages through your Google Voice account. NBC News will add commands that allow you to tell the app to read the latest news headlines or search for news about certain topics. MyFitnessPal adds commands that will allow you to speak your latest weigh-in numbers for the day or tell it what you just ate. This type of extensibility is hugely important for Cortana. None of the other smartphone speech interfaces have this kind of extensibility for 3rd party apps. Yes, some are programmed to interface with certain 3rd party services and apps, but they’re not nearly as limitless as what Cortana could be capable of.
A natural understanding of which app or service should be accessed
One problem with Windows Phone’s current speech UI is that you have to learn the specific commands for interacting with all of these third party apps. If you don’t say it right with the right timing between commands, it doesn’t understand what you’re trying to do. Siri has a much more natural interface that interprets keywords from your speech and then guestimates what you might be trying to do. That type of natural and flexible understanding needs to come to Cortana and she needs to be able to access and output relevant data from the 3rd party apps and services installed by the user.
There were some mobile phones in the 90’s that were always listening for a specific keyword that would activate its other voice command listening mode. A few smartphones these days have brought that feature back, but not Windows Phone. Oddly, the Xbox One implemented this extremely useful feature even when the Xbox is powered off. You can actually tell the Xbox to turn itself on with your voice. I’m not sure a “Cortana, turn my phone on” command is necessary, but an always-available keyword that can be used to signify the beginning of a command would be extremely useful. A customizable “name” for the speech interface would be most logical. The default should obviously be “Cortana”, but I imagine it would be fun to specify a different name for your digital assistant.
A holographic avatar
Okay, okay, maybe holographic projectors or Holotanks won’t be available in time for Cortana’s initial beta release this year, but I kind of hope that she’ll feature a visual representation of her persona. Ideally it would be a customizable avatar. I’m not sure the cartoonish avatars we see in Xbox LIVE are appropriate though. An animated version of the actual Cortana character from the Halo games would be pretty awesome as an option, but people have different tastes in their artificial intelligence assistants and should be allowed to customize accordingly.
Granted the main point of a speech interface is so that you can interact with the computing system without having to look at it, but there’s still a lot of value in having a face-to-face interface once in a while.
A basic understanding of relationships
On Windows Phone, we often group people by relationships. Cortana should be able to understand who I mean when I say, “Cortana, Skype my mom” or “Cortana, send a text to the guys.” Relationships are kind of important for reminders too. First of all, Cortana needs to be able to speak reminders and notifications (a feature sorely missing from Windows Phone 8 and lost from Windows Mobile 2003-6.5). Cortana should be able to understand that I don’t need to be notified of every single birthday that my gabillion friends on Facebook have, but I do want to be reminding of birthdays for the special relationships in my life.
Location awareness and navigation
One of the features that I’ve been missing in Windows Phone for a very long time is a speech UI for GPS navigation. Nokia’s HERE Drive app is great and all, but it doesn’t integrate with the Speech UI at all. There are some 3rd party apps that try to add a speech UI to HERE Drive, but that’s kind of a work-around. Cortana desperately needs to tap into Nokia’s HERE services. I should be able to say, “Cortana, where’s my car?” and she’ll tell me “It’s 400 yards north of you, do you want directions?” Or “Cortana, navigate to the nearest gas station along my current route.”
Another type of thing that Cortana needs to understand is the relationships between appointments and my location. I can already pin a Live Tile in Windows Phone that can show me live traffic updates and delays for my route to work and route home. That should be expanded automatically to my appointments in my calendar. Cortana should be able to tell where I am, where my appointment is, what traffic might delay transportation along the way, and then she should speak up and let me know if there might be a problem and I suggest that I leave earlier. I should also be able to say things like, “Cortana, how long will it take me to get to my next appointment by subway?” or “Cortana, hail a taxi.”
Integration with a larger ecosystem
As mentioned in the beginning, Microsoft is expected to bring Cortana to the Xbox and Windows 9 as well. Hopefully your specific preferences and relationship with Cortana will follow you across platforms. If I’m in the living room and I say, “Cortana, Skype my mom,” then Cortana should be able to see that I’m sitting in front of the Xbox One and she should carry out that command on the big screen connected to the Xbox. In fact, ideally, all commands and functions that are accessible by Cortana on my phone should also be accessible from the Xbox One and any Windows PC that I might be sitting in front of.
That’s only Microsoft’s ecosystem though. Cortana should be capable of extending and controlling other connected devices as well via 3rd party apps and service plug-ins. For example, digital thermostats, car computers, remote light controls, etc.
Is that enough to leapfrog Apple’s Siri or are you thirsty for more?