Providing Campus Mail Services with a Linux Cluster

  • Slides: 25
Download presentation
Providing Campus Mail Services with a Linux Cluster Giles Malet University of Waterloo gdmalet@uwaterloo.

Providing Campus Mail Services with a Linux Cluster Giles Malet University of Waterloo gdmalet@uwaterloo. ca University of Waterloo OUCC 2004

Overview • • • What our department does Our mail problems Our proposed solutions

Overview • • • What our department does Our mail problems Our proposed solutions What we have done so far Problems we’re aware of What to do next • 25 slides… but feel free to interrupt! University of Waterloo OUCC 2004 2

Introductions • Information Systems & Technology (IST) • Provide services & expertise to campus

Introductions • Information Systems & Technology (IST) • Provide services & expertise to campus • Project members – Dawn Keenan - sendmail, MRTG – Giles Malet - project lead, software – Rob Schmidt - Clam. AV, J-chkmail – Jeff Voskamp - LDAP, systems stuff – plus assistance from others…. University of Waterloo OUCC 2004 3

Why do this? • • • More and more spam and viruses More demand

Why do this? • • • More and more spam and viruses More demand on IST for solution Want to centralize the problem (xhier) Want something everyone can use Old system overwhelmed • See project charter University of Waterloo OUCC 2004 4

Services Desired • Robust user@uwaterloo. ca mail server • Virus scanning with immediate rejection

Services Desired • Robust user@uwaterloo. ca mail server • Virus scanning with immediate rejection • Refuse executables etc. by file extension • Spam identification so people can filter – Also DNS blacklists – People can opt out of (only) spam processing • LDAP needed internally – perhaps allow user lookups? University of Waterloo OUCC 2004 5

What we considered • Using Solaris and Red. Hat Linux already…. . • “Cluster”

What we considered • Using Solaris and Red. Hat Linux already…. . • “Cluster” problem – needed scalability – – – Open. Mosix Open. Knoppix Linux Cluster Project Linux HA Red. Hat Enterprise • …and so on. – But what does “cluster” mean? Vague. University of Waterloo OUCC 2004 6

Decisions • Start simple, try more involved setup if load is too high •

Decisions • Start simple, try more involved setup if load is too high • Keep detailed statistics so we know what’s changing (more later) • Ask for (some) input from campus • Do our own load balancing (else Cisco) • 4 cheap systems or 3 “good” systems? – Spread load, reduce impact of failure University of Waterloo OUCC 2004 7

Hardware Purchased • 4 Dell servers – 1 with mirrored SCSI disks, 3. 2

Hardware Purchased • 4 Dell servers – 1 with mirrored SCSI disks, 3. 2 Ghz CPU – 3 with single IDE disk, 3. 0 Ghz CPU • 1 gig memory • 1 x 100 + 1 x 1000 Mbps ethernet • Rack-mounted, serial consoles (Annex, Cyclades) University of Waterloo OUCC 2004 8

Hardware Configuration University of Waterloo OUCC 2004 9

Hardware Configuration University of Waterloo OUCC 2004 9

Hardware – ‘head’ server • Most powerful, most robust • Runs LDAP, My. SQL,

Hardware – ‘head’ server • Most powerful, most robust • Runs LDAP, My. SQL, web servers, incoming mail • Mirrored disk, NFS shared to slaves – all cluster data in one place (mail queues) • Only machine that is backed up • Firewall / load balancing University of Waterloo OUCC 2004 10

Hardware - slaves • 3 identical machines, run all services • Only software difference

Hardware - slaves • 3 identical machines, run all services • Only software difference is IP configuration (fix with DHCP) • Increasing the number provides more CPU, less exposure to software failure • Local disk only stores O/S – logs copied up to cerberus overnight • Firewall: only incoming connection is ssh from maintenance server • No user accounts: ssh as root from head server University of Waterloo OUCC 2004 11

Software Details • Will try Open. Source first, spend money second. • Looked at

Software Details • Will try Open. Source first, spend money second. • Looked at AFS etc, went to NFS (simple) • Use things we know, plus some experimentation • Emphasis was to get this going quickly, will fine-tune it later. University of Waterloo OUCC 2004 12

Sendmail • We know sendmail, and it works • Wanted “stock” system – no

Sendmail • We know sendmail, and it works • Wanted “stock” system – no more phlookup --- thus LDAP • Something flexible: “milter” interface allows addons; can direct TCP connections from campus sendmails back to cluster University of Waterloo OUCC 2004 13

Clam Anti-Virus • Open source • Auto-updating from remote server – allows submission of

Clam Anti-Virus • Open source • Auto-updating from remote server – allows submission of ‘fingerprints’ • 3 components (freshclam, milter, clamd) – Some stability problems with latter two – Too many threads: 375 * 8 megs = 3 gigs – Deeply nested messages are problematic University of Waterloo OUCC 2004 14

J-Checkmail • Disallow incoming mail based on contents (regex) and extension • Also a

J-Checkmail • Disallow incoming mail based on contents (regex) and extension • Also a milter interface • Lots of ongoing development (integrate virus scanning etc. ) University of Waterloo OUCC 2004 15

Spam. Assassin • Only marks spam – up to you to filter • Configurable

Spam. Assassin • Only marks spam – up to you to filter • Configurable preferences – must be on host initiating scan, thus problems with MX’d machines • Use My. SQL internally • Not foolproof: lose mail on false positive University of Waterloo OUCC 2004 16

Open. LDAP • Sendmail understands LDAP • It is fast! (2 hours versus 5

Open. LDAP • Sendmail understands LDAP • It is fast! (2 hours versus 5 mins) • Used only for mail address lookups, thus rebuild every few hours • “Hidden” users have minimal details • Starting to need LDAP for other systems, and ADS is tricky (Oracle Calendar) University of Waterloo OUCC 2004 17

IPTables firewall & routing • Route incoming connections to available hosts – DNAT (load

IPTables firewall & routing • Route incoming connections to available hosts – DNAT (load balancing) • SNAT outgoing mail connections • Firewall the rest – reduce patching • nodewatch does auto-updating – runs on head server, talks multicast – simple polling of available servers – written in-house (C program + shell scripts) University of Waterloo OUCC 2004 18

Statistics • Important to know what “normal” is • Heavy use of MRTG and

Statistics • Important to know what “normal” is • Heavy use of MRTG and friends • See graphs: http: //mailservices. uwaterloo. ca • Who’s using it? connections. txt University of Waterloo OUCC 2004 19

Gotchas • 3 gigs is not enough virtual memory – 8 megs stack /

Gotchas • 3 gigs is not enough virtual memory – 8 megs stack / thread • Set ulimits: memory, number of processes • Logging is main load on disk – separate from mail spools • System will get a lot of unwanted attention – Dictionary attacks on sendmail – rate limit – LDAP scans – limit to campus, limit number of results, CPU per request – Firewall heavily, and hide the slaves • How to test without losing mail? University of Waterloo OUCC 2004 20

More gotchas… • What if a slave machine dies? – others can handle the

More gotchas… • What if a slave machine dies? – others can handle the load • What if the head machine dies? – – Lose NFS, My. SQL, LDAP Could rebuild in a few hours from backups Backup MX gets to do the work Need a similar system somewhere, to share • Need better way to distribute configs • It helps if a single netmask covers all hosts • Duplicate scanning (next slide) University of Waterloo OUCC 2004 21

Duplicate scanning • Machines tux and ist MX’d to cluster – mail to user@ist

Duplicate scanning • Machines tux and ist MX’d to cluster – mail to user@ist goes: cluster -> ist -> cluster -> tux when. forward on ist to tux. • Also, mail to user@cluster gets forwarded to destination, which also scans. • Currently it’s not worth the effort to prevent this University of Waterloo OUCC 2004 22

Undeliverable postmaster mail • Sendmail aborts when postmaster mail is undeliverable, queues grow and

Undeliverable postmaster mail • Sendmail aborts when postmaster mail is undeliverable, queues grow and grow • Mail containing virus from off-campus goes to user@machine-1 – tries to forward to user@machine-2 but is blocked, so – tries to bounce, but bogus From: header – tries to send to postmaster@machine-1, which is forwarded to machine-2…. University of Waterloo OUCC 2004 23

Where we’re going • 3 system cluster for development – try new ideas: RH

Where we’re going • 3 system cluster for development – try new ideas: RH cluster, others – new sendmail, scanners etc. • Disaster recovery – head or slave dies • Centralised LDAP server, but need to deal with MS Active Directory • Document all this, and how to use it – hand it over to Production Support University of Waterloo OUCC 2004 24

Winding down • Spam / virus problems are getting worse, so we’ll be busy

Winding down • Spam / virus problems are getting worse, so we’ll be busy for a while. • Contact us if you want more info, exchange ideas, give advice • Slides will be made available University of Waterloo OUCC 2004 25