Gmail is one of the largest email service. It has a user base of over 1 Billion people and is one of Google’s oldest products. Launched in April 2004, the service has improved a lot over the years. I know it’s hard to think of the time when you signed up for Gmail or who referred you to it, but one of the features which attracted a lot of initial users was it’s amazing spam filtering capability. In the days of Yahoo and Rediff, this was one thing that made the new product stand out, it’s important to note that Gmail which was initially a 20 percent project by a Googler.
Gmail’s spam filtering has only gotten better over the years. And since this is one such thing that makes your life a little easier, we’ll take a detailed into how Gmail spam filter actually works.
For starters, on an abstract level, you can consider the filtration to be a staged process. And there’s some sophisticated technology behind the process. To decide if an email is a spam or not, several hundred rules are applied to each email that passes Google’s data centers. The rules are capable of detecting general spams while the other borderline messages are quarantined for later. Each rule describes some attributes of a spam and has some numerical value associated with it, based on the likelihood that the attribute is a spam. An equation is then formed on the basis of the weighted significance of each attribute. The resulting value is the spam score for the message. This score is then tested against a sensitivity threshold set by an individual’s spam filter. And thus, it is categorized as a spam or valid email.
What makes the process unique is the way it handles each user. Consider two cases, a person who knows how to manipulate the spam filters and hence has an aggressive level of filtration, and another person who has no idea what spam means. In a situation when a borderline spam is received by the first person, he marks the message as a spam as he knows that the message is indeed a spam.
What happens internally is what’s interesting, while there was just this one user who marked the message as a spam, it in turn trained the system that all such messages are to be flagged, so now every user on the Gmail network would experience a difference as the flag teaches the system how to categorize further such messages. Power of Machine Learning!!
Now that we know how the service keeps getting better, let’s see what are the common types of spam filters and when does Gmail apply those filters.
Common Types of Spam Filters
- In case the Blatant Blocking is enabled for a user, the most obvious spam is bounced or deleted even before it reaches the inbox.
- Each user also has a Bulk Email Filter that sets a base level of aggressiveness for filtering the other remaining spam. (This is typically quarantined)
- Each user can optionally adjust four other Category Filters to filter a particular kind of spam containing a specific kind of content, depending on the level of aggressiveness desired. (These messages are generally the get rich quickly or, sexually explicit content)
- Null Sender Disposition lets you choose how to dispose of all messages without an SMTP envelope sender address. These are generally the Non-Delivery Reports.
- Null Sender Header Tag Validation is a process by which the system examines each inbound message for the presence of an SMTP envelope sender address and for each message’s security digital signature.
When Do These Filters Apply?
These filters are constantly examining each message that lands in your inbox. Spam Category filters are applied generally at the end when all other filtering is done. Blatant Spam Filtration occurs before all other filters, but it does not block messages from approved senders. The following are the key scenarios when Blatant Spam Filtration fails and other mechanisms take over :
- In an event of an approved sender bypasses the spam filter, even though the message contains spam-like content.
- In an event where a message with approved content bypasses the category filter.
- In an event when virus blocking overrides spam filtration. Virus Blocking scans all messages passing through the filters and if a message consists of a malicious file or link, it overrides the spam filtration process. Meaning, if a file is quarantined as junk but it’s also determined to be infected, then it will be processed according to the virus filter disposition.
In case all this still seems a little too complicated for you to comprehend, here’s a video that the Gmail team at Google created to help understand spam filtration better.