What is machine learning, and how is Google using it to keep spam out of your inbox?
The movie Terminator Genisys is in theaters now, and is warning us away from computers that think for themselves (remember, Genisys is Skynet!). No, Pocketnow didn’t turn into a movie reviews website while you weren’t looking, but the movie does draw some parallels with the topic, so we’ll run with it.
Just in case you didn’t know (SPOILER ALERT!), in the Terminator stories, Skynet becomes self-aware and all but destroys humanity. The way it does this is through a computer technology called “machine learning”, and Google has just deployed it to fight spam in your inbox.
Machine learning is a type of AI (artificial intelligence) that provides computers with the ability to learn – without being explicitly programmed. It’s that “explicitly programmed” bit that’s important. A computer program can do anything you tell it to do, but you’ve got to tell it exactly what you want it to do. Accounting for every situation, condition, and variable is a very tedious undertaking!
Thankfully, that’s probably not the case. Machine learning is typically associated with computational statistics, data modeling, and prediction-making. Put another way, machine learning is great when it comes to figuring out if an email is spam, but won’t gain sentience and launch all nuclear devices to eradicate humans (at least I hope not).
In addition to spam filtering, machine learning is also used in optical character recognition (OCR), facial recognition, search engines, data mining, and is even used to decide which ads you’ll see on web pages you visit and hear on streaming radio stations you listen to.
Spam is the junk email that marketers flood our inboxes with. Email is much less expensive to send than postal mail, and it’s much easier to target people who are going to act on the message – though the worst offenders don’t target people, they just flood everyone, relying on quantity over quality to make money.
Sri Harsha Somanchi, Gmail product manager, says that less than one-tenth of one percent of the email that the average Gmail inbox sees is spam. Conversely, “ham” (wanted email) is only mislabeled as spam 0.05% of the time. Those are both impressive numbers which Google says are the result of using its “artificial neural network” to evaluate and categorize billions of incoming emails so unwanted messages and phishing attacks can be weeded out automatically.
How much of our email is actually spam? According to security firm Kaspersky, spam makes up a staggering 59.2% of all email.
From the very early days, Gmail asked users to report unwanted messages as “spam”, and to salvage wrongfully categorized “ham” from the spam folder by marking those messages as “not spam”. In addition to helping keep their own mailboxes tidy, this also helped the Gmail system learn what people categorize as “spam”, as well as why some spammy looking messages may not be.
Not everyone is the same, nor are their definition of what “spam” is. That’s the newsworthy part of Google’s recent announcement.
“We also recognise that not all inboxes are alike. So while your neighbor may love weekly email newsletters, you may loathe them. With advances in machine learning, the spam filter can now reflect these individual preferences.”
One might think that all this is bad news for legitimate bulk email senders (the people that you want to receive email from). Thankfully (or regrettably, depending on your position), Google has released a new utility called Postmaster Tools (similar to Webmaster tools for websites) to help them.
“The Gmail Postmaster Tools help qualified high-volume senders analyze their email, including data on delivery errors, spam reports, and reputation. This way they can diagnose any hiccups, study best practices, and help Gmail route their messages to the right place.”
Whether or not you think Postmaster Tools is a good idea or not, for the regular Gmail customer it means “no more dumpster diving for that confirmation code”, which is a good thing, right? Additionally, you won’t need to download anything, update an app, or even tick a settings box to get the new features. It’s all built-in to the Gmail service itself. It’s already running.
What about you? How do you feel about Gmail’s improved machine learning capabilities? Will this help eradicate phishing and spam? I don’t know about you, but I welcome our new computer overlords.