Filtering Spam With Justin Mason Spam Assassin Project
- Slides: 20
Filtering Spam With Justin Mason, Spam. Assassin Project & Deersoft http: //Spam. Assassin. org/
What Is Spam? • Best description: "Unsolicited Bulk Email" • In human terms: bulk e-mail you didn't want, and didn't ask for • Mailing lists, newsletters, "latest offers": not spam, if you asked for them in the first place • Name courtesy of Monty Python: “spam, spam and spam”
Why Bother Filtering Spam? • Seems to be about 30% to 60% of mail traffic, and increasing • Users are forced to waste time wading through their inbox – costs their employers money • Impossible to unsubscribe – “unsubscribe” addresses work only 37% of the time, according to the FTC • Legal retaliation not possible, yet • Just plain irritating!
Spam Volume Is Increasing (data from Brightmail. com)
Filtering: Homebrew Blacklists • First round of "spam filters": internal blacklists, maintained by in-house admin staff • Match addresses, and delete those from known spammers • Later, match "bad words" (Viagra, porn) • Quite hard to configure; centralised; lots of work to keep up to date
Filtering: DNS Blacklists • Identify spam source computers by IP address • Allow mail system to look up a public database on the internet as mail arrives • Block the message, if its sender's address is blacklisted • Now at least 20 DNS blacklists, with varying reliability • Many false positives – eircom. net's main mail server!
Spam. Assassin Concepts • Zero-configuration where possible • Lots of rules to determine if a mail is spam or not – "Fuzzy logic": rules are assigned scores, based on our confidence in their accuracy – These are combined to produce an overall score for each message – If over a user-defined threshold, the mail is judged as spam • No one rule, alone, can mark a mail as spam
Spam. Assassin Concepts, pt. 2 • Combines many systems for a "broadspectrum" approach: – Detect forged headers – Spam-tool signatures in headers – Text keyword scanner in the message body – DNS blacklists – Razor, DCC (Distributed Checksum Clearinghouse), Pyzor • Spammers cannot aim to defeat 1 system; the others will catch them out
Integration Into Mail Systems • Wrote Spam. Assassin with flexibility of integration in mind • Many have been written: – Integration into Mail Transfer Agents (sendmail, qmail, Exim, Postfix, Microsoft Exchange) – Integration into virus-scanner MTA plug-ins (MIMEDefang, amavisd-new) – IMAP/POP proxies and clients – Commercial plug-ins for Windows clients (Eudora, MS Outlook) • And many more I don't know about!
Accuracy and False Positives • The big issue with filtering to date: – not just “how much spam does it catch? ” – but “how many legitimate mails get caught, too? ” • Many systems do not pay attention to this problem – Some blacklists even use "false positives" as a weapon against service providers selling to spammers • FPs are much worse than spam getting through – much more inconvenient to user
Evolving a Better Filter • Spam. Assassin assigns scores using a genetic algorithm – Given a big collection of human-classified mail, determine what tests each mail triggers – Use this to "evolve" an efficient score set – Exactly the kind of problem a genetic algorithm is good at – Allows "shotgun" rules to be scored low, where they cannot do damage
False Positive Rate • Spam. Assassin is 98. 5% accurate on our test corpora, with default settings – 0. 6% false positives – 91% of all spam caught correctly – with network tests on, spam hit-rate probably increases to about 93 -95% • Highest rate available among present tools • Tunable by the user -- reduce FPs by increasing the threshold, ditto vice-versa
Effect of the Threshold Setting
What To Do When You've Caught It • Since classifiers are imperfect, blind deletion is bad • Better to mark the mails, and allow user to check over them infrequently • Also good to mark for legal reasons – In the UK, it may be illegal to hold mail (even spam) for more than 3 days
Features For Large-Scale Use: "spamd" • Client-server interface to Spam. Assassin • Pre-loads, so much faster for high volumes • Can load user preferences from an SQL database • Can load-balance -- uses TCP/IP • Deployed at several large organisations and ISPs: The Well, Salon. com, Panix, Transmeta, Source. Forge, Stanford
Large-Scale Filtering For Your Network • • Different from filtering for yourself Many users get little spam Should use conservative settings Better to use “opt-out by default” – notify that spam filtering is available, and ask them if they want it
How Can Network Administrators Fight Spam? • Scan for Open Relays & Proxies on your network • Block proxy ports at the firewall • Audit web servers for “Form. Mail” or other insecure web-to-mail scripts • Spam traps reporting to network blacklists: Razor, DCC, Pyzor • Run Spam. Assassin, or Spam. Assassin Pro!
How Do The Spammers Feel? • Already hurting, according to CBS: – “[I’ve gone through] unbelievable hardships [to keep spamming]. . . My operating costs have gone up 1, 000% this year, just so I can figure out how to get around all these filters” • Spam relies on low overheads and extremely cheap delivery • Disrupt the equation and they will give up!
Future Directions • Learning filters (Bayesian probability etc. ) – Learn automatically, to detect what "good" mail to your network looks like • "Hash-cash" – Sending mail currently more-or-less free – With hash-cash, each recipient requires CPU time for the sender – Spam. Assassin can provide "bonus points" for hash-cash users
Fin • http: //spamassassin. org/ – Spam. Assassin for UNIX – (free software) • http: //www. deersoft. com/ – Spam. Assassin Pro: MS Outlook, Exchange – (commercial version) – (my employers!)
- Ingress filtering vs egress filtering
- Sybils group inc
- Tyquan assassin
- Black hand assassin
- Assassin's mace wargame
- Aleissia laidacker
- Green assassin weed killer
- What id spam
- Komponen spam
- Nie spam
- Spam porn
- Spam
- Spam bukan jaringan perpipaan
- Metode pemasangan pipa hdpe
- Spam
- Nie spam
- Perencanaan teknis spam
- Anti spam exchange 2003
- Spam engineering
- Nie spam
- Spam