Spam Filters Used by Networks
Antispam programs use a variety of different techniques to determine the probability of a given piece of email being spam. These techniques are employed by filters, which examine each piece of email; each filter uses a specific technique.
Here are some of the most commonly used filter types:
Keyword checking: The most obvious way to identify spam is to look for certain words that appear either in the email’s subject line or in the email body. For example, a keyword checking filter might look for profanity, sexual terms, and other words or phrases such as “Get rich quick!”
Although this is the most obvious way to identify spam, it’s also the least reliable. Spammers learned long ago to leave common words out of their spams to avoid these types of filters. Often they intentionally misspell words or substitute numbers or symbols for letters, such as the numeral 0 for the letter o, or the symbol ! for the letter l.
The biggest problem with keyword checking is that it often leads to false positives. Friends and relatives might intentionally or inadvertently use any of the banned words in their emails. Sometimes, the banned words appear in the middle of otherwise completely innocent words. For example, if you list Cialis as a keyword that you want blocked, you’ll also block the words specialist or socialist.
For these reasons, keyword filters are typically used only for the most obvious and offensive words and phrases, if they’re used at all.
Bayesian analysis: One of the most trusted forms of spam filtering is Bayesian analysis, which works by assuming that certain words occur more often in spam email than in other email. This sounds a lot like keyword checking, but Bayesian analysis is much more sophisticated than simple keyword checking. The Bayesian filter maintains an index of words that are likely to be encountered in spam emails. Each word in this index has a probability associated with it, and each word in the email being analyzed is looked up in this index to determine the overall probability of the email being spam. If the probability calculated from this index exceeds a certain threshold, the email is marked as spam.
Here’s where the magic of Bayesian analysis comes in: The index is self-learning, based on the user’s actual email. Whenever the filter misidentifies an email, the user trains the filter by telling the filter that it was incorrect. The user typically does this by clicking a button labeled “This is spam” or “This is not spam.” When the user clicks either of these buttons, the filter adjusts the probability associated with the words that led it to make the wrong conclusion. So, when the filter encounters a similar email in the future, it’s more likely to make the correct determination.
Sender Policy Framework (SPF): Surprisingly, SMTP (the Internet email protocol) has very poor built-in security. In particular, any email server can easily send email that claims to be from any domain. This makes it easy to forge the From address in an email. SPF lets you designate via DNS which specific email servers are allowed to send email from your domain. An antispam SPF filter works by looking up the sending email server against the SPF records in the DNS of the domain specified by the email’s From address.
Blacklisting: Another trusted form of spam filtering is a blacklist (also known as blocklist), which uses a list of known spammers to block email from sources that aren’t trustworthy. There are two types of blacklists: private and public. A private blacklist is a list that you set up yourself to designate sources you don’t want to accept email from. A public blacklist is a list that is maintained by a company or organization and is available for others to use.
Note that simply blacklisting a sender email address isn’t much help. That’s because the sender email address is easy to forge. Instead, blacklists track individual email servers that are known to be sources of spam.
Unfortunately, spammers don’t usually set up their own servers to send out their spam. Instead, they hijack other servers to do their dirty work. Legitimate email servers can be hijacked by spammers and, thus, become spam sources, often without the knowledge of their owners. This raises the unfortunate possibility that your own email server might be taken over by a spammer, and you might find your email server listed on a public blacklist. If that happens, you won’t be able to send email to anyone who uses that blacklist until you have corrected the problem that allowed your server to be hijacked and petitioned the blacklist owners to have your server removed.
Whitelisting: One of the most important elements of any antispam solution is a whitelist, which ensures that email from known senders will never be blocked. Typically, the whitelist consists of a list of email addresses that you trust. When the antispam tool has confirmed that the From address in the email has not been forged (perhaps by use of an SPF filter), the whitelist filters looks up the address in the whitelist database. If the address is found, the email is immediately marked as legitimate email, and no other filters are applied. So, if the email is marked as legitimate by the whitelist filter, the other filters are not used.
Most whitelist filters will let you whitelist entire domains, as well as individual email addresses. You most certainly do not want to whitelist domains of large email providers such as gmail.com or comcast.net. But you should whitelist the domains of all your business partners and clients to ensure that emails from new employees at these key companies are never marked as spam.
Some antispam programs automatically add the recipient addresses of all outgoing emails to the whitelist. In other words, anyone that you send an email to is automatically added to the whitelist. Over time, this feature can drastically reduce the occurrence of false positives.
Use the whitelist to preemptively allow important email that you’re expecting from new customers, vendors, or service providers. For example, if you switch payroll providers, find out in advance what email addresses the new provider will be using so that your payroll staff doesn’t miss important emails.
Graylisting: Graylisting is an effective antispam technique that exploits the fact that if a legitimate email server can’t successfully deliver an email on its first attempt, the server will try again later, typically in 30 minutes. A graylist filter automatically rejects the first attempt to deliver a message but keeps track of the details of the message it rejected. Then, when the same message is received a second time, the graylist filter accepts the message and makes note of the sender so that future messages from the sender are accepted on the first attempt.
Graylisting works because spammers usually configure their servers to not bother with the second attempt. Thus, the graylist filter knows that if a second copy of the email arrives after the initial rejection, the mail is probably legitimate.
The drawback of graylisting is that the first time you receive an email from a new sender, the email will be delayed. Many users find that the benefit of graylisting is not worth the cost of the delayed emails, so they simply disable the graylist filter.