Scalable Apache for Beginners Aaron Bannert aaronapache org
Scalable Apache for Beginners Aaron Bannert aaron@apache. org / aaron@codemass. com
Measuring Performance What is Performance?
How do we measure performance? l Benchmarks ¡ Requests per Second ¡ Bandwidth ¡ Latency ¡ Concurrency (Scalability)
Real-world Scenarios Can benchmarks tell us how it will perform in the real world?
What makes a good Web Server? l Correctness l Reliability l Scalability l Stability l Speed
Correctness l Does it conform to the HTTP specification? l Does it work with every browser? l Does it handle erroneous input gracefully?
Reliability l Can you sleep at night? l Are you being paged during dinner? l It is an appliance?
Scalability l Does it handle nominal load? l Have you been Slashdotted? ¡ And l What did you survive? is your peak load?
Speed (Latency) l Does it feel fast? l Do pages snap in quickly? l Do users often reload pages?
Apache the General Purpose Webserver Apache developers strive for correctness first, and speed second.
Apache 1. 3 l Fast enough for most sites l Particularly on 1 and 2 CPU systems.
Apache 2. 0 l Adds more features ¡ filters ¡ threads ¡ portability (has excellent Windows support) l Scales to much higher loads.
Apache HTTP Server Architecture Overview
Classic “Prefork” Model Apache 1. 3, and l Apache 2. 0 Prefork l Many Children l Each child handles one connection at a time. Parent l Child … (100 s)
Multithreaded “Worker” Model l Apache 2. 0 Worker Parent Few Children l Each child handles many concurrent connections. l Child 10 s of threads … (10 s)
Dynamic Content: Modules l Extensive API l Pluggable Interface l Dynamic or Static Linkage
In-process Modules l Run from inside the httpd process ¡ CGI (mod_cgi) ¡ mod_perl ¡ mod_php ¡ mod_python ¡ mod_tcl
Out-of-process Modules l l Processing happens outside of httpd (eg. Application Server) Tomcat ¡ mod_jk/jk 2, mod_jserv mod_proxy l mod_jrun l Parent Child Tomcat
Architecture: The Big Picture Parent 100 s of threads Tomcat 10 s of threads Child mod_jk mod_rewrite mod_php mod_perl … (10 s) DB
Terms and Definitions Terms from the Documentation and the Configuration
“HTTP” l Hyper. Text Transfer Protocol A network protocol used to communicate between web servers and web clients (eg. a Web Browser).
“Request” and “Response” Request Response Web Browser (Mosaic) l Web Server (Apache) browsers request pages and web servers respond with the result.
“MPM” l Multi-Processing Module l An MPM defines how the server will receive and manage incoming requests. l Allows OS-specific optimizations. l Allows vastly different server models (eg. threaded vs. multiprocess).
“Child Process” aka “Server” l Called a “Server” in httpd. conf l A single httpd process. l May handle one or more concurrent requests (depending on the MPM). Parent Child Servers … (100 s)
“Parent Process” main httpd process. l Does not handle connections itself. l Only creates and destroys children. Parent Only one Parent l The Child … (100 s)
“Client” Web Browser (Mosaic) l Web Server (Apache) Single HTTP connection (eg. web browser). ¡ Note that many web browsers open up multiple connections. Apache considers each connection uniquely.
“Thread” l In multi-threaded MPMs (eg. Worker). l Each thread handles a single connection. l Allows Children to handle many connections at once.
Apache Configuration httpd. conf walkthrough
Prefork MPM l Apache 1. 3 and Apache 2. 0 Prefork l Each child handles one connection at a time l Many children l High memory requirements l “You’ll run out of memory before CPU”
Prefork Directives (Apache 2. 0) l Start. Servers l Min. Spare. Servers l Max. Clients l Max. Requests. Per. Child
Worker MPM l Apache 2. 0 and later l Multithreaded within each child l Dramatically reduced memory footprint l Only a few children (fewer than prefork)
Worker Directives l Min. Spare. Threads l Max. Spare. Threads l Threads. Per. Child l Max. Clients l Max. Requests. Per. Child
Keep. Alive Requests l Persistent connections l Multiple requests over one TCP socket l Directives: ¡ Keep. Alive ¡ Max. Keep. Alive. Requests ¡ Keep. Alive. Timeout
Apache 1. 3 and 2. 0 Performance Characteristics Multi-process, Multi-threaded, or Both?
Prefork l High memory usage l Highly tolerant of faulty modules l Highly tolerant of crashing children l Fast l Well-suited for 1 and 2 -CPU systems l Tried-and-tested l “You’ll model from Apache 1. 3 run out of memory before CPU. ”
Worker l l l Low to moderate memory usage Moderately tolerant to faulty modules Faulty threads can affect all threads in child Highly-scalable Well-suited for multiple processors Requires a mature threading library (Solaris, AIX, Linux 2. 6 and others work well) l Memory is no longer the bottleneck.
Important Performance Considerations l sendfile() support l DNS considerations l stat() calls l Unnecessary modules
sendfile() Support No more double-copy l Zero-copy* l Dramatic improvement for static files l Available on l ¡ ¡ Linux 2. 4. x Solaris 8+ Free. BSD/Net. BSD/Open. BSD. . . * Zero-copy requires both OS support and NIC driver support.
DNS Considerations l Host. Name. Lookups ¡ DNS query for each incoming request ¡ Use logresolve instead. l Name-based ¡ Two Allow/Deny clauses DNS queries per request for each allow/deny clause.
stat() for Symlinks l Options ¡ Follow. Sym. Links l Symlinks are trusted. ¡ Sym. Links. If. Owners. Match l Must stat() and lstat() each symlink, yuck!
stat() for. htaccess files l Allow. Override ¡ stat() for. htaccess in each path component of a request ¡ Happens for any Allow. Override ¡ Try to disable or limit to specific sub-dirs ¡ Avoid use at the Document. Root
stat() for Content Negotiation l Directory. Index ¡ Don’t use wildcards like “index” ¡ Use something like this instead Directory. Index index. html index. php index. shtml l mod_negotiation ¡ Use a type-map instead of Multi. Views if possible
Remove Unused Modules l Saves Memory ¡ Reduces code and data footprint l Reduces some processing (eg. filters) l Makes calls to fork() faster l Static modules are faster than dynamic
Testing Performance Benchmarking Tools
Some Popular (Free) Tools l ab l flood l httperf l JMeter l . . . and many others
ab l Simple Load on a Single URL l Comes with Apache l Good for sanity check l Scales poorly
flood l Profile-driven load tester l Useful for generating real-world scenarios l I co-authored it l Part of the httpd-test project at the ASF l Built to be highly-scalable l Designed to be extremely flexible
JMeter l Has a graphical interface l Built on Java l Part of Apache Jakarta project l Depends heavily on JVM performance
Benchmarking Metrics l What are we interested in testing? ¡ Recall that we want our web server to be Correct l Reliable l Scalable l Stable l Fast l
Benchmarking Metrics: Correctness No errors l No data corruption l Protocol compliant l l Should not be an everyday concern for admins
Benchmarking Metrics: Reliability l MTBF - Mean Time Between Failures l Difficult to measure programmatically l Easy to judge subjectively
Benchmarking Metrics: Scalability l Predicted concurrency l Maximum concurrent connections l Requests per Second (rps) l Concurrent Users
Benchmarking Metrics: Stability l Consistency, Predictability l Errors per Thousand l Correctness under Stress l Never returns invalid information l Common problem with custom web-apps ¡ Works well with 10 users, but chokes on 1000.
Benchmarking Metrics: Speed l Requests per Second (rps) l Latency ¡ time until connected ¡ time to first byte ¡ time to last byte ¡ time to close l Easy to test with current tools l Highly related to Scalability/Concurrency
Method 1. Define the problem eg. Test Max Concurrency, Correctness, etc. . . 2. Narrow the scope of the problem Simplify the problem Use tools to collect data 4. Come up with a hypothesis 5. Make minimal changes, retest 3.
Troubleshooting Common pitfalls and their solutions
Check your error_log l The first place to look l Increase the Log. Level if needed ¡ Make sure to turn it back down (but not off) in production
Check System Health l vmstat, systat, iostat, mpstat, lockstat, etc. . . l Check interrupt load ¡ NIC l Are ¡A might be overloaded you swapping memory? web server should never swap l Check system logs ¡ /var/log/message, /var/log/syslog, etc. . .
Check Apache Health l server-status ¡ Extended. Status (see next slide) l Verify “httpd -V” l ps -elf | grep httpd | wc -l ¡ How many httpd processes are running?
server-status Example
Other Possibilities l Set up a staging environment l Set up duplicate hardware l Check for known bugs ¡ http: //nagoya. apache. org/bugzilla/
Common Bottlenecks l No more File Descriptors l Sockets stuck in TIME_WAIT l High Memory Use (swapping) l CPU Overload l Interrupt (IRQ) Overload
File Descriptors l Symptoms ¡ entry in error_log ¡ new httpd children fail to start ¡ fork() failing across the system l Solutions ¡ Increase system-wide limits ¡ Increase ulimit settings in apachectl
TIME_WAIT l Symptoms ¡ Unable to accept new connections ¡ CPU under-utilized, httpd processes sit idle ¡ Not Swapping ¡ netstat shows huge numbers of sockets in TIME_WAIT l Many TIME_WAIT are to be expected l Only when new connections are failing is it a problem ¡ Decrease system-wide TCP/IP FIN timeout
Memory Overload, Swapping l Symptoms ¡ ¡ ¡ l Ignore system free memory, it is misleading! Lots of Disk Activity top/free show high swap usage Load gradually increasing ps shows processes blocking on Disk I/O Solutions ¡ ¡ ¡ Add more memory Use less dynamic content, cache as much as possible Try the Worker MPM
How much free memory do I really have? l Output from top/free is misleading. l Kernels use buffers l File I/O uses cache l Programs share memory ¡ Explicit shared memory ¡ Copy-On-Write after fork() l The only time you can be sure is when it starts swapping.
CPU Overload l Symptoms ¡ ¡ ¡ l top shows little or no idle CPU time System is not Swapping High system load System feels sluggish Much of the CPU time is spent in userspace Solutions ¡ ¡ Add another CPU, get a faster machine Use less dynamic content, cache as much as possible
Interrupt (IRQ) Overload l Symptoms ¡ ¡ l Frequent on big machines (8 -CPUs and above) Not Swapping One or two CPUs are busy, the rest are idle Low overall system load Solutions ¡ Add another NIC l l bind it to the first or use two IP addresses in Apache put NICs on different PCI busses if possible
Next Generation Improvements
Linux 2. 6 l NPTL and NGPT ¡ ¡ Next-Gen Thread Libraries for Linux Available in Red. Hat 9 already O(1) scheduling patch l Preemptive Kernel patch l l All improvements affect Apache, but the Worker MPM will likely be the most affected.
Solaris 9 l 1: 1 threads ¡ Decreases thread library overhead ¡ Improves CPU load sharing l sendfile()-like ¡ Zero-copy support (since late Solaris 7)
64 -bit Native Support Sparc had it for a long time l G 5 s now have it (sort-of) l AMD 64 (Opteron and Athlon 64) have it l l Noticeable improvement in Apache 2. 0 ¡ ¡ l Increased Requests-per-second Faster 64 -bit time calculations Huge Virtual Memory Address-space ¡ mmap/sendfile
The End Thank You!
- Slides: 81