The Basics Of Bayesian Spam Filtering

Author Arvind Singh
Published January 17, 2008
Word count 468

Bayesian spam filtering is a way to distinguish between legitimate emails and illegitimate spam emails, through a process that uses Bayesian statistical methods.

Bayesian spam filtering has become a popular way to distinguish between legitimate emails and illegitimate spam emails, through a process that uses Bayesian statistical methods. It filters emails by classifying documents into categories. Based on the contents of the message in your email, the Bayesian spam filters calculate the probability of the message being a spam. They are much more robust than the normal content based filters, and their anti spam approach hardly has false positives.

Normally when you receive an email, one look tells you whether the email is a spam or not. To your eyes, there is 'zero' probability of a spam looking like a good email. How would it be if spam filters, too, worked in the same way!

Bayesian Spam Filters

Bayesian spam filters are what are known as scoring content-based spam filters. They try to work the way your eye does in identifying spam emails, by looking for words and other characteristics that typify spams. Every characteristic typical of spam is assigned a score, and the total spam score for the whole message is computed. Depending on the type of Bayesian spam filter you are using, it may also look for legitimate email characteristics, thereby lowering the total score.

The basic difference between the Bayesian spam filters and other simple scoring content based spam filters is that the Bayesian spam filters build the list themselves, as against other filters that depend on a manually built list of characteristics.

You start with a sizable bunch of emails you have identified as spam, and another bunch of good emails. The filters look at both, the legitimate and the spam emails and calculate in what probability various characters appear in them. Bayesian spam filters may look at:

The words in the message body
The headers (message paths and senders)
The word pairs and phrases
HTML code, such as colors
Where a particular phrase appears (meta information)

The Problems With Scoring Content Based Filters

Though the scoring based spam filters work well, they also encounter certain problems; the normal ones more so than the Bayesian spam filters. These are some of the problems faced:

The scoring content based spam filters build a list of characteristics from the spam emails and the good emails they get. For building a good list of spam characteristics, mail needs to be collected from hundreds of sources (email addresses). This may weaken the efficiency of the spam filters, as the characteristics of the good email would be different for each person.
If the spammers make an effort to make their mails look like genuine mails, the filtering characteristics may have to be corrected manually - a very big effort.

Author is admin and technical expert associated with development of security and performance enhancing software like Registry Cleaner, Anti Spyware, Window Cleaner. Learn how Anti Spam filter helps in securing online privacy. Visit our Home page or Resource Center to read more about products.

Article source: https://articlebiz.com

This article has been viewed 1,196 times.

Rate article

This article has a 4 rating with 1 vote.

Article comments

There are no posted comments.