In some ways Microsoft’s new speech UI, Cortana, has surpassed the speech interface of its competitors in iOS and Android. Most people are going to be comparing Cortana to Siri and Google Now since those are the two other popular smartphone speech interfaces at the moment. At Pocketnow, we know a little better and will be comparing Cortana to how she should work.
Google and Apple are really quite new to the whole speech UI for smartphones thing. Microsoft’s old Windows Mobile smartphone operating system had a speech interface called Voice Command back in 2002. Yeah, that was 12 years ago. Siri has only been around since 2010, and even the iPhone has only been around since 2007. Google Now was launched in 2012, and the original version of Android was only released in late 2008.
Anyway, let’s get back to Cortana. She’s Microsoft’s newly upgraded speech interface for Windows Phone (and perhaps Windows 8 and Xbox later on). Microsoft has had 12 years of experience with this kind of thing so our expectations were pretty high when we unboxed our review unit six days ago.
What Cortana’s doing right
3rd party application programing interface
Many of the commands that were supported by the speech interface in Windows Phone 8 are still available in Cortana. One of the big differentiating features of Microsoft’s speech UI is that it can interact with any 3rd party app as long as the 3rd party developer adds the functions and commands into the app’s code. Cortana is going to make controlling 3rd party apps even easier by allowing more variation in the commands. Previously, you really had to learn the exact phrase that the 3rd party app was looking for, but it sounds like Cortana will aim to support a wider range of phrasing and understanding. This is a huge deal. None of the other speech interfaces on smartphones out there have as much extensibility as this. Basically, anyone can make an app and add all sorts of functions that can be accessed through Cortana. You can ask the NBC News app to read today’s headlines out loud and the speech UI will do that. Or, if Google didn’t shut it down, you could tell MetroVoice to send a text message to somebody using your Google Voice account instead of your normal phone.
Doing stuff I ask her to do
With the old Bing on Windows Phone, if I said, “Track flight UA 676″ it would show some tracking information for that flight on the screen. Now with Cortana, she’ll respond by telling me information about the flight, “United Airlines flight 676 is scheduled on time to depart Boston at 8:18 AM.” but she’ll also continue tracking the flight while it’s in the sky and she’ll display updates on her live tile. A pop-up notification will appear when the flight has landed (even if it’s early). Very cool!
I can also tell her to do things like “Play some Country music,” and she’ll reply with “Queuing up your country music” before the music starts playing. If I ask her “What song is this?”, she’ll turn to listening mode and then display the name of the song along with the artist, album, and a link to download it from Xbox Music. That was a very cool feature, though I wish she would speak the answer out loud so I could hear the information. I tried asking her to “look at this” thinking that she might load Bing Vision and try to recognize items, barcodes, or QR codes that I might be pointing the camera at, but that didn’t work unfortunately. I can also tell her to turn call forwarding on/off, redial the last call, send a text message, create a new OneNote note, wake me up at a specific time, or remind me about something on a specific day and time. One other very clever feature is the ability to tell Cortana to “Remind me to ask my Dad about Texas the next time I talk to him.” The next time that I started a new email to my Dad, a pop-up reminder appeared on the screen with the reminder that I requested (but she didn’t speak the reminder out loud and wouldn’t remind me when replying to an email that my Dad sent to me).
Another great feature I discovered was that Cortana knows about public transportation. If I say, “Show me how to get to Times Square on the subway”, she’ll produce directions that make use of public transportation.
Helping me navigate
It would seem that if Cortana is able to recognize the location of an appointment, then she can automatically calculate the time it will take to get there from your location and notify you when you need to leave (including traffic delays). It’s best if you enter an address into the location field of your calendar appointments, but in my test, Cortana recognized “Fedex Elmsford” without any more accurate location information. When the traffic warning and appointment reminder appears on the screen, you’ll also see a “Start voice nav” button at the bottom which will load the destination location into your default GPS navigation program (you can choose your default in the settings). Unfortunately, this is all screen-based interaction only. Ideally, Cortana would tell me out loud that I should leave for such-and-such appointment soon because it will take me 16 minutes to get there in current traffic… at which point I should be able to say “Start voice navigation” or “Snooze” or “Dismiss” or “Reschedule”.
GPS navigation integration with calendar appointments and traffic updates has been something I’ve been wishing for since 2008 when I wrote “How to design a killer GPS navigation program” (whose text was unfortunately lost in a server issue.) There was one program by Proxpro on Blackberry at the time that could sort of do it, but nothing has come close to the kind of functionality I’ve been looking for.
Understanding the context of our conversation & environment
There’s a setting where I can tell Cortana to be quiet during a time when my calendar has an appointment marked as “busy”. Just like a real assistant, she knows that I shouldn’t be interrupted during that time. I can also specify a whole schedule of quiet hours where I don’t want to be interrupted, and Cortana can be set to automatically respond to incoming text messages and phone calls that I might receive during those times. Awesome! It’s too bad she can’t tell people I’m underground on the subway and have no reception.
As for understanding the context of a conversation, when you ask her something about the weather, you can press the listening button again and ask her to change the units of the temperature. In other words, she knows what you were just asking about before and correctly assumes that you’re still referring to that information that she just told you about.
Keeping a notebook with all the information she’s allowed to access
This is a hugely important feature for people out there who want to know exactly what these high-tech devices have access to. Cortana’s notebook is where she stores all the information that you give her as well as the options for what you want her to have access to. This gives the service a very easy-to-understand listing of the speech UI and accompanying cloud service is allowed to do.
What Cortana’s doing wrong
Not always listening
One thing that’s still missing, but we hear that Microsoft will be adding the option soon, is an always listening mode. Mobile phones in the 90’s were often configurable to listen for a specific word to then turn on the voice command mode. Microsoft’s Xbox works the same way even when it’s turned off. You can say, “Xbox on” and it will turn on. Presumably, the key word for Cortana to start listening would be her name, which should work pretty nicely since that’s how we activate the listening feature with other humans that we know.
Beep Beep Beep
When Michael Fisher and I were first playing with the new Windows Phone 8.1 operating system, before sending me home with the device, he said to Cortana, “Wake me up at 7am.” I was expecting Cortana’s voice to say, “Good morning Adam, it’s time to wake up.” and then repeat that kind of phrase every few seconds until I told her, “Okay, I’m up!” That’s the way it should have worked. Unfortunately, all I heard at 7 am were some lame beeping sound effects that were easy to ignore.
This is how all of Cortana’s reminders work actually. Just vague, uninformative, useless sound effects. She never actually speaks to me and tells me the information I need to hear when I need to hear it. That is, except for incoming text messages. Those are the only things that can be read aloud when the notification comes in (although incoming caller ID names can be announced with an option as well). On previous versions of Windows Phone, you could also hear incoming Windows Live Messenger and Facebook Messenger messages read aloud as they arrived. Unfortunately, that feature has been removed and it’s painfully disappointing.
In the new Action Center, every type of notification possible has settings where you can choose a custom sound effect, pop-up toast, and vibration options. That area should also have a checkbox that allows Cortana to read the notification aloud when received (and offer subsequent supported response actions) during non-quiet hours. THAT would make Cortana extremely useful!
The biggest missing feature is that Cortana doesn’t read appointment reminders out loud when they are scheduled. Back around the turn of the century, I primarily used Windows Mobile with Microsoft Voice Command as my smartphone of choice. So, 10 years ago I could be on my motorcycle with a Bluetooth headset inside the helmet and I would hear “In 15 minutes drive to Brooklyn” because that was something I had on my calendar. Right away, I knew what I had to do next! Unfortunately, with smartphones these days, I would either miss that reminder or have to pull over somewhere and look at the screen like a sucker.
Why don’t you look at the screen, instead of asking me?
That brings us to another frustrating behavior in Cortana; her inability to answer commands and questions audibly and forcing the user to look at the screen. More often than not, she might say something like, “I found this for you” which effectively means stop what you’re doing, pull the car over to the side, take the phone out of your pocket, and look at the screen. Sorry, Cortana, I don’t have time for that! Why don’t you just tell me the answer to “15 times 82″?
No speech interface should ever require looking at a computer screen or touching buttons. The whole point of a speech interface is to free the user from that and make multi-tasking easier. If you’re going to make me look at the screen anyway, why do I even bother talking to you? I can press buttons on my own if you’re going to make me stop what I’m doing.
Help me use this phone
One awesome potential use for a virtual personal assistant is to get tutorials about how to use your new smartphone. Unfortunately, Cortana’s help system is extremely limited. Asking her, “What can I say?” is about the extent of it and then all she does is give you a list of categories ON THE SCREEN. Not helpful! A decade ago, Windows Mobile had a much more pervasive speech interface help system where you could ask “What can I say?” and it would speak out all of the categories that you could learn about, then you could respond with which category you wanted to hear more about, and it could then tell you more about the commands it could understand.
But even that is not enough. Sure that was cool 10 years ago, but a speech UI should be way more helpful by now! I should be able to say, “Cortana, can you show me how to rearrange these tiles on my start screen?” and then she would speak instructions to me while I interact with the phone’s start screen tiles.
She should also help me install apps. If I say, “Is there an Instagram app for this phone?”, Cortana should respond with, “Yes, there are several. Would you like to install one?” at which point I could say, “Please install the top rated Instagram app.” and she would say, “I’ll install 6Tag the next time you’re connected to WiFi.” Instead, guess what Cortana does when you say, “Is there an Instagram app for this phone?” That’s right, she displays a web page link to the iTunes App Store page for the iPhone version of Instagram… effectively saying, “Why don’t you use an iPhone instead?” That’s kind of embarrassing.
Missing features from 10 years ago
As mentioned, Microsoft used to have a great smartphone speech UI. Cortana is pretty awesome, too, but she’s missing out on a lot that the old guys could do many years ago. Besides the option to read appointment reminders aloud, which was indispensable when on the go, the old Windows Mobile Voice Command did a lot of other things that still don’t have an equal. For example, I could ask my smartphone, “What is my battery level?” and Voice Command would respond by speaking the percentage level to me in my Bluetooth headset. If only Cortana had that capability, perhaps I could ask her, “When am I going to have to recharge my phone again?” and she could say, “You have about 2 hours of remaining battery life.” Voice command also supported asking about signal strength, listing missed calls, changing the ringer volume, and turning on vibrate mode. How strange is it that we still can’t say, “Cortana, mute all sounds” or “Set my phone to vibrate mode”?
Longer dictation recognition
Cortana’s speech to text feature is still pretty limited too. She’ll often stop listening in the middle of a long sentence and not translate correctly. My colleague Michael Fisher wishes you could speak punctuation names and have those added to the recognition. Personally, I wish it was smarter than that and had some grammar autocorrect features for full sentences. It should also be able to detect pauses and tone inflections in my voice. A pause would probably add some type of punctuation as should a list. Maybe that’s asking too much right now though.
Cortana certainly deserves some forgiveness since she’s currently in beta (and even Windows Phone 8.1 is in beta), but then again, Microsoft has been doing the speech UI thing for many years. It should be far more advanced by now or at least as capable as what we had in the old days.
Cortana was named after a fictional key artificial intelligence character in the Halo video games set about 500 years from now. By then she’ll be able to hack and interface with alien computer systems and even finish your sentences. She can speak to you while you’re deep in battle and provide valuable information without distracting you from your current objective. Unfortunately Windows Phone’s current version of Cortana frequently requires you to stop what you’re doing and interact with the display. If Master Chief had to do that on Halo, he wouldn’t last very long at all.