Spam An Analysis of Spam Filters Joe Chiarella

  • Slides: 11
Download presentation
Spam: An Analysis of Spam Filters Joe Chiarella Jason O’Brien Advisors: Professor Wills and

Spam: An Analysis of Spam Filters Joe Chiarella Jason O’Brien Advisors: Professor Wills and Professor Claypool

Project Goals To analyze the effectiveness of different kinds of spam filters. n Focused

Project Goals To analyze the effectiveness of different kinds of spam filters. n Focused on Spam. Assassin and Bogofilter n

Spam. Assassin Rule-based filter – over 400 rules. n Each Rule has an associated

Spam. Assassin Rule-based filter – over 400 rules. n Each Rule has an associated weight. n Score of an email is sum of weights across all matching rules. n User adjustable threshold. n

Bogofilter Bayesian filter. n Calculates probability that an email is spam using past email.

Bogofilter Bayesian filter. n Calculates probability that an email is spam using past email. n Looks at frequency of words (not order of words). n Accuracy should improve over time. n

Data Collection Email collected from students, professors, small business employees, and free email accounts.

Data Collection Email collected from students, professors, small business employees, and free email accounts. n 4626 ham emails, 5010 spam emails, separated into ham and spam mailboxes for each user. n

Methodology Compared accuracy of Spam. Assassin and Bogofilter for each user’s email. n Tested

Methodology Compared accuracy of Spam. Assassin and Bogofilter for each user’s email. n Tested same number of ham emails and spam emails from each user. n Ignored results from first 50 emails to allow Bogofilter to learn. n

Comparison of Bogofilter and Spam. Assassin on Ham CP = Company Person PR =

Comparison of Bogofilter and Spam. Assassin on Ham CP = Company Person PR = Professor ST = Student FE = Free Email

Comparison of Bogofilter and Spam. Assassin on Spam CP = Company Person PR =

Comparison of Bogofilter and Spam. Assassin on Spam CP = Company Person PR = Professor ST = Student FE = Free Email

Spam. Assassin Score Analysis

Spam. Assassin Score Analysis

Conclusion Bogofilter and Spam. Assassin effectiveness depend greatly on the user. n Neither filter

Conclusion Bogofilter and Spam. Assassin effectiveness depend greatly on the user. n Neither filter outperformed the other in all cases. n Filtering Spam is hard. n

Questions?

Questions?