Sometimes I ask my assistant to remind me about something, and when she’s sitting in the same room as me and it’s time to be reminded about something important, she claps her hands twice to get my attention. If I don’t look at her right away, she won’t tell me what the message is, and then she’ll wait a few more seconds and then close her eyes and go to sleep. If this was a real person, that scenario would sound pretty ridiculous, but that’s exactly how our smartphones behave these days.

I’m sorry, I don’t understand what 2 hand claps means. We’re in the same room though. I can hear you. Why don’t you just speak to me in English? I’m honestly surprised and disgusted that smartphones still haven’t evolved far enough to understand this basic mode of communication that humans have been using for thousands and thousands of years.

Let’s take a look at the history of the vague unintelligible sound effect.

Long ago, electronic communications were very limited. You could send a beep or no beep. Morse code was invented based on series of long beeps and short beeps in sequences that spelled out letters and words. A translator on each end of the wire would piece together the codes to create a message. Of course, you had to understand what the codes meant in order to decode the message, but at least decoding the message was possible.

Then the telephone was invented so that actual audible speech could be transferred over the wires. Unfortunately, it wasn’t possible for the hardware on the receiver’s end to automatically announce the name of the person calling. Instead you got a bell ringing and then you had to go over and pick up the receiver to activate the connection so that then you could speak to the person on the other end in real time.

When phones became mobile, that bell was replaced by digital monophonic synthesized sound effects. The hardware was still very primitive and many phones could only play very basic tones and songs in MIDI format. Speech synthesis was nowhere to be seen, but caller ID had started working so in order for you to know who was calling, you could specify a different type of ring tone to play when different specific callers were calling. The first commercial mobile phone with customizable ring tones was the NTT DoCoMo Digital Mova N103 Hyper by NEC, released in May 1996. All you had to do was memorize which tone was assigned to which person.

Phones began to evolve some more around the turn of the century and we started getting ring tones that could play full fidelity music. That’s cool and all, but you still had to manually set different ring tones for specific callers so that you could kind of know who was calling. It turns out that nobody does this really. I was in the grocery store the other day and a phone rang… it was the default iPhone ring tone. Three people took out their phones and looked at the screen because everyone uses the default ring tone. Even on television shows everyone uses the default ring tone.

While other phone manufacturers were still dealing with polyphonic ringtones, Microsoft almost got it right with their Voice Command software on Windows Mobile 2003 smartphones and Pocket PC phones. Voice Command was so useful in communicating smartphone related information, that I actually turned off the normal ringtones and sound effects completely. That’s how it should be!

Normally Voice Command would play the normal sound effect and then speak the notification aloud, whether it was, “Incoming call from so and so” or “New message from so and so” or “Drive to Connecticut in 15 minutes”. When shutting off the sound effects, Voice Command would speak these notifications right away. That meant I could listen to the actual information that my smartphone was trying to communicate to me and react accordingly without having to look at the screen. It was great on the motorcycle inside a helmet with headphones and a microphone.

There was a time when I could understand what my phone wanted to tell me right away.

There was a time when I could understand what my phone wanted to tell me right away.

While none of the modern smartphones include speech announcements for all notifications by default, there are some hacks and apps that you can install to get sort-of informative notifications. “SpeakMe” on Android, can use the text-to-speech engine to automatically read notifications out loud as they happen. It works well for some apps, but not very well for others. It doesn’t read calendar appointments out loud or Google Now notifications, but will read things like Facebook and Instagram notifications. Whatever you do, don’t turn on notifications for OneNote, though; it will continuously repeat “Onenote” over and over. For the iPhone users, if you can do a jailbreak, there’s a hack to do something similar as well. The problem is that neither of those platforms allow a global speech interface to interact with third party programs.

Speech interfaces really should be smarter by now.

Speech interfaces really should be smarter by now.

While, yes all third party app notifications should be capable of being announced verbally with speech, the speech interface should also be listening for secondary commands to send back to the third party application. Windows Phone’s Cortana speech interface is already able to interact with 3rd party applications, but they are not able to interact with Cortana as much as they should (in terms of activating speech notifications and listening for subsequent commands). Currently, only incoming text messages are able to be read aloud and responded to on Windows Phone, although the speech interface used to also support hands-free MSN messenger and Facebook messenger interaction. (Removing those was a big step backwards.)

Then there’s the problem of how the smartphone knows whether it’s appropriate for it to speak to you at any given time. That should be pretty simple. I expected Windows Phone’s “Quiet Hours” feature to determine when Cortana would not be allowed to speak notifications to me. It turns out Cortana does not speak any notifications besides SMS and Caller ID, but the mechanism for communicating “do not disturb” times based on my calendar appointment schedule is certainly already there. iOS also has a “do not disturb” mode, but it’s not smart enough to integrate with the user’s calendar. Android Lollipop will be adding a “do not disturb” mode as well which seems to function exactly like iOS’s version.

Ideally, even when “Quiet hours” are turned off and the speech UI is allowed to interrupt, it should do so politely. Instead of just saying out loud, “You have a new message from John Smith. Would you like to read it or ignore it?”, a “request interruption” option could make the speech UI say something like, “Excuse me, Adam” first. Then I could respond by saying, “Go ahead” or “Come back in 10 minutes” or by pressing a snooze button on the screen in case talking to my phone is not appropriate at the time. Of course, it should be listening for my response even before it’s finished saying something, so I can skip straight to the next command and speed up the conversation.

Do you find yourself reaching for and looking at your phone every time it plays a vague unintelligible sound effect so that you’ll know what it is? I see people doing this all the time, even while driving. There has to be a better way!