It happened so long ago, the memories come to me in vague shadows.
It was an unremarkable weeknight in 2002. I was watching TV in my college dorm room, probably waiting for the then-new Star Trek: Enterprise to return from a commercial break. A 30-second spot for Nokia flashed onto the screen, the last frame dwelling on the image of a mobile phone amid a field of futuristic blue plasma. The model number escapes me, as does the sub-brand, but that doesn’t matter. What matters is that closing product image, which made the phone look, on my 13-inch CRT TV screen, like it had no buttons.
As it turns out, that was an optical illusion caused by some mixture of a sub-par television and the device’s paint job. When I visited a store asking to see “that new Nokia,” I was led to a fairly typical candybar dumbphone whose keypad, if oddly painted, was still fully present. It was kind of a letdown.
But the disappointment did nothing to blunt the force of the suggestion the fictitious Nokia phone had implanted in my mind: the thought that maybe mobile technology had grown beyond the need for keypads. Voice command, after all, was still huge in those fledgling years of the new century, with new developments in the technology keeping it in the headlines. Was it possible, I wondered, that Nokia had found a way to render keys on mobile phones obsolete?
Obviously, that’s not how it panned out. It’s ten years later, and we’re still beholden to keypads. Sure, most of them are virtual these days, and that’s cool (for most of us), but they’re still there. With the rise in popularity of text over talking, to say nothing of email, keyboards have become even more critical to most of the mobile population.
But manufacturers and software designers are trying to change that, or at least to bring voice input back into relevance. Apple’s doing it with Siri, Google’s doing it with its
Majel Assistant nameless speech-to-text engine, and even Samsung and LG are hopping into the fray. Voice command is hot again.
We’ve weighed in with plenty of our own opinions on this matter, from my discussions of Samsung’s S Voice and voice control in general, to Jaime Rivera’s thoughts on what’s next for the input method. Then recently, Brandon Miniman hit us up with an abstract query: “what would happen if you couldn’t use your keyboard at all? What if you had to use voice input for everything?” The related questions were compelling: could a truly keyboardless device be useful, or even usable? What was it like for people who, for whatever reason, couldn’t use keyed inputs? What would life be like if all you had to manipulate mobile software was your voice?
Those sounded like questions worth answering, not to mention a fun way to embarrass myself in public. So I decided to give it a go.
Right away, we agreed on some guidelines. Because I still needed to be able to write (for, you know, my job and all), laptop and desktop keyboards were exempt. This would be a mobile-only experiment. But any text input on any of my mobile devices -tablets and smartphones both- would have to be done using the stock voice-recognition software. That means no typing, and also no third-party apps.
While this was initially envisioned as a cross-platform experiment, it quickly became obvious that the vast majority of my usage would be on Android equipment. We therefore decided to save the iOS and Windows Phone experiments for future projects, and focused on testing the stock voice-input capabilities of both Ice Cream Sandwich and Jelly Bean. The devices used were a Galaxy S III, a Galaxy Nexus LTE, and a Nexus 7. (Curiously, while there’s been much discussion of Jelly Bean’s superiority in the voice-command department, I didn’t notice much difference in my testing. At any rate, this wasn’t meant as a side-by-side comparison.)
So I could still communicate somewhat effectively during the six-day test period, I decided I’d be allowed to correct typos by hand after I’d tried correcting the problem at least once via re-dictation. As for the duration of the test, yes: the headline says “week” … and that was the plan. But by day six I’d learned everything I thought I could via the experiment, and I wanted nothing more than to bring it to a close. To see how I got to that point, read on!
The first thing I noticed, within just a few minutes of dipping my toe into the voice-only lifestyle, was how profoundly it impacted my text-messaging. Using voice input in lieu of a keyboard completely eliminates any semblance of privacy. Just like a phone call, anyone within earshot hears every word of your side of the conversation. The next time someone asks me what’s so great about text messaging, “privacy” is going to be the first word out of my mouth.
Once I got comfortable with everyone on the sidewalk hearing what I was saying, my next hurdle was my own brain. It turns out that different portions of the brain govern written and verbal communication. Composing a text message or email entirely through speech turned out to be much more difficult, error-prone, and time-consuming than doing the same thing via dictation.
I maintained an Evernote log throughout the experience, allowing me to dictate observations as they came to me. Here’s the direct transcription of the above thought, as I read into my phone. In this and following examples, words I’ve since corrected are italicized.
It’s interesting to see how differently the brain treats verbal communication. Compared to typing
it things, I mean. It’s very difficult to compose a message properly by speaking. As opposed to typing it. Just on a fundamental brain power level. See what I mean? This sentence sucks. There’s also some expected confusion with your and you’re, and there and their. Except when you are saying both together in a sentence. New line carriage return
Did you catch that “new line carriage return” bit above? Yeah; the transcription software really, really doesn’t give a damn if you want to start a new line; it’s just going to keep plugging along on the same paragraph until you throw a fingertip in there and hit enter a few times. That leads to some massive blocks of text, if you’re not careful.
Speaking of which, here’s my thoughts on punctuation in this kind of situation:
Obviously, punctuation is also an issue. I can’t seem to find a way to start a new line, so I will continue with my observations. Its amazing how much you input text or rather how often you input text in a given day. I have caught myself several times, these first few hours, entering text in
feel fields like delivery take out forms, without thinking about it. Incidentally, another unfortunate side effect is that you cannot listen to music while composing any sort of message. That is, you cannot listen to music on the device you are using, without it costing pausing to accommodate the voice recognition software. Fortunately, android appears to be smart enough, in jelly bean at least, to intelligently pos pause and then resume the music.
Despite the unsightly strikethroughs there, it needs to be said that Android’s dictation software was quite good at discerning what I was trying to say throughout most of the experience. Granted, I was speaking fairly slowly and clearly, but even when I got into a groove and started talking at a pretty good clip, the software kept up as long as I had a good data connection. It even picked up profanity pretty well, properly censoring it so I didn’t offend any thin-skinned friends.
Still, I wasn’t having an experience I’d call enjoyable, or even tolerable.
Incidental notes: well, I see that the: is not a completely up I know I said I’m off no you just upset umm let’s move on. What I was trying to say was, a punctuation does not appear to be as anemic on jelly bean, or google has recently implemented support for semi: ha ha ha ha and it reproduces laughter well, it’s not a;. Oh wait there is the semi: what about.
In fact, it didn’t even take a full day for me to completely give up on something and wait until I was at a real keyboard. It was the end of day one, and I was trying to voice-tweet, to congratulate Joe Levi on an excellent live-tweeting of the most recent episode of the Pocketnow Weekly. The problem: I was on a bus whose roaring engine filled the cabin with quite a racket. The dictation software struggled to pick my voice out of the noise haze, but its valiant efforts were in vain. I gave up on the tweet entirely, putting it off until I got to Starbucks and a real keyboard. My first failure was complete.
The second one came shortly thereafter, when I tried verbally entering a confirmation number for a coming flight into TripCase. (That turned out to be the flight where they wouldn’t let me use Airplane Mode.) Voice dictation on Android doesn’t seem to offer a way to dictate letters and numbers, so this attempt didn’t even get off the ground. That’s actually a huge impediment to usefulness on the whole, because even if the app does a good job interpreting your intentions most of the time, you still need to break out the keyboard to correct the errors that inevitably do crop up. And you can forget about entering things like URLs, passwords, and phone numbers.
Even when I kept my dictation confined to text communications, the process was untenable. By the end of day one, I’d started a pattern I was to repeat throughout the entire experiment: I started altering what I actually wanted to say, changing my messages into a format the voice recognition software would understand. Instead of composing a text that said, “meeting Robyn at Sacco’s for drinks,” which I knew might be interpreted as “beating robin and tacos for kicks,” I said simply “heading out to Davis Square.” The result was a message that got out the door just fine, but one which was less precise, and less useful.
Ultimately, that lack of precision, that forced generality, conspired with the service’s other limitations to make for quite an unexpected reversal. Where before I was a data junkie and a messaging hound, when confined to voice input I dreaded any situation that required anything more than a tap or a scroll. Using voice-only entry made me despise texting. Suddenly I felt like one of those people who fear smartphones; every time I picked one up, I dreaded the inevitable two to three minutes of tedious work I’d have to do just to send a simple email.
But that’s not a condemnation of Android’s voice package; it’s what happens when you use a product in a manner inconsistent with its creator’s intent. The voice interface is an accessory, an alternate input avenue. It’s neither designed for nor capable of constant use throughout all parts of the OS. No matter how good the software is -and in ICS and Jelly Bean, it’s really outstanding- it will always fail if pushed too far.
In a way, experiencing that failure firsthand has given me some small insight into the true limitations of the mobile devices that make up today’s world. They’re wonders of technology, constantly inspiring our amazement and awe … but like all computers, they constantly fool us into thinking they’re smarter than they are. At the end of the day, these are really just boxes of limited logic, vibrating piles of silicon that make yes-or-no computations. And while they’ve gotten a lot more intelligent in the voice interface category, they’ve still got a long way to go before they can be considered a true keyboard alternative, let alone a keyboard replacement.
We’ll have more thoughts on the future role of voice interface soon. Until then, tell us what kind of voice recognition software is your favorite (or your least favorite), and what you think the future holds for those who talk to mobile computers. The comment section is below; extra points to those who dictate their comments.
Messenger pigeon image source: Joseph Wilkins