By Joe Levi | November 25, 2013 7:22 AM
We’ve been talking to our phones since the very beginning — or so it would seem. As we’ve said before, that’s not entirely true. Although we use our phones to talk to other people, we haven’t really been talking “to” our phones, but rather “through” our phones. That started to change years ago as manufacturers began to include various features that let us control our phones with our voices. At first these were very basic and required you to record the names of those who you wanted to dial — sometimes multiple times. Others made you go through lengthy “training” processes during which your phone would learn how you talked. While these sounded cool, the gimmick quickly wore off. Later, Microsoft introduced an app called Voice Command, which brought interacting with our smartphones much closer to “convenient” — but it still was just commands, and didn’t yet approach true voice transcription. After the fall of Windows Mobile, voice interaction took a backseat to the new stylus-less, finger-friendly user interfaces.
Over the last few iterations of smartphones, we’ve started to see the relevance of voice commands rise again. Adding to this new generation of devices has been the inclusion of basic voice transcription. Siri and Google Now both let us talk to our phones. Sometimes these exchanges are somewhat conversational — other times they’re nothing short of comical, but asking questions and getting answers still isn’t “transcription”.
What is voice transcription, question mark
Transcription is the written or printed representation of something. Voice transcription is the written or printed representation of something that has been dictated or otherwise spoken aloud. That’s where things get confusing.
I’m sitting here behind a keyboard, my fingers dancing across keys. Letters become words. Words become sentences which evolve into paragraphs. Before long an entire article has been created. It’s not just words that are being written, it’s punctuation, it’s hints at not only what the words say, but the meaning behind how they’re intended.
Transcribed words can generally be equivocated to a long string of words stitched together into a mess. If you were to transcribe this article, you’d need to speak the words to include commas, periods, em dashes, paragraphs, etc. Doings so em dash as one might expect em dash can get confusing comma and break one’s train of thought period it doesn’t come naturally to most comma and we don’t speak that way in natural conversation period
See what I just did there? I dictated the last two sentences — rather, that’s what I’d have to say if I wanted them to be transcribed. Luckily for us, most voice transcription tools today can pick up on some of the nuances of our natural speech, and automatically turn our ramblings into properly structured content — to one degree or another.
For now, true voice transcription is still in its infancy — at least if it’s “mature” form is a solution that will allow us to be completely conversational with our devices, and them able to translate our spoken words into a written document. What we’re left with is one of two things: we either need better voice transcription, or we need to become better at dictation. In the meantime, those two will probably continue their awkward tango, with hilarious texts and misinterpreted emails being the result.
Interested in learning more about voice commands and transcription? Check out Michael Fisher’s experiment where he went an entire week without touching his keyboard. Learn about how Google started trying to learn your speech patterns years ago. Take a look at where Google Now and Siri were just one year ago in our Voice Dictation Showdown video. And, of course, don’t just sit there! Add your thoughts to the conversation in the comments section below — especially if you have a tale of hilarity or embarrassment to share!