Building Scalable Web Architectures Aaron Bannert aaronapache org
Building Scalable Web Architectures Aaron Bannert aaron@apache. org / aaron@codemass. com http: //www. codemass. com/~aaron/presentations/ apachecon 2005/ac 2005 scalablewebarch. ppt
Goal l How do we build a massive web system out of commodity parts and open source?
Agenda 1. LAMP Overview 2. LAMP Features 3. Performance 4. Surviving your first Slashdotting a. Growing above a single box b. Avoiding Bottlenecks
LAMP Overview Architecture
LAMP Linux Apache My. SQL PHP (Perl? )
The Big Picture
External Caching Tier
External Caching Tier l What is this? ¡ Squid ¡ Apache’s mod_proxy ¡ Commercial HTTP Accelerator
External Caching Tier l What does it do? ¡ Caches l Images, CSS, XML, HTML, etc… ¡ Flushes l outbound HTTP objects Connections Useful for modem users, frees up web tier ¡ Denial of Service Defense
External Caching Tier l Hardware Requirements ¡ Lots of Memory ¡ Fast Network ¡ Moderate to little CPU ¡ Moderate Disk Capacity l ¡ l Room for cache, logs, etc… (disks are cheap) One slow disk is OK Two Cheapies > One Expensive
External Caching Tier l Other Questions ¡ What ¡ How to cache? much to cache? ¡ Where to cache (internal vs. external)?
Web Serving Tier
Web Serving Tier l What is this? ¡ Apache ¡ thttpd ¡ Tux Web Server ¡ IIS ¡ Netscape
Web Serving Tier l What does it do? ¡ HTTP, ¡ Serves HTTPS Static Content from disk ¡ Generates l Dynamic Content CGI/PHP/Python/mod_perl/etc… ¡ Dispatches l requests to the App Server Tier Tomcat, Weblogic, Websphere, JRun, etc…
Web Serving Tier l Hardware Requirements ¡ Lots and lots of Memory l Memory is main bottleneck in web serving l Memory determines max number of users ¡ Fast Network ¡ CPU depends on usage ¡ l Dynamic content needs CPU l Static file serving requires very little CPU Cheap slow disk, enough to hold your content
Web Serving Tier l Choices ¡ How much dynamic content? ¡ When to offload dynamic processing? ¡ When to offload database operations? ¡ When to add more web servers?
Application Server Tier
Application Server Tier l What does it do? ¡ Dynamic Page Processing JSP l Servlets l Standalone mod_perl/PHP/Python engines l ¡ Internal l Services Eg. Search, Shopping Cart, Credit Card Processing
Application Server Tier • How does it work? 1. Web Tier generates the request using l l l 2. HTTP (aka “REST”, sortof) RPC/Corba Java RMI XMLRPC/Soap (or something homebrewed) App Server processes request and responds
Application Server Tier l Caveats ¡ Decoupling of services is GOOD l Manage Complexity using well-defined APIs ¡ Don’t decouple for scaling, change your algorithms! ¡ Remote Calling overhead can be expensive l Marshaling of data l Sockets, net latency, throughput constraints… l XML, Soap, XMLRPC, yuck (don’t scale well) l Better to use Java’s RMI, good old RPC or even Corba
Application Server Tier l More Caveats ¡ Remote Calling introduces new failure scenarios l Classic Distributed Problems • How to detect remote failures? • How long to wait until deciding it’s failed? How to react to remote failures? What do we do when all app servers have failed?
Application Server Tier l Hardware Requirements ¡ Lots and Lots of Memory l App Servers are very memory hungry l Java was hungry to being with l Consider going to 64 bit for larger memory-space ¡ Disk depends on application, typically minimal needed ¡ FAST CPU required, and lots of them ¡ (This will be an expensive machine. )
Database Tier
Database Tier l Available DB Products ¡ Free/Open Source DBs l l ¡ Postgre. SQL GNU DBM Ingres SQLite Commercial l l Oracle MS SQL IBM DB 2 Sybase Sleepy. Cat l l My. SQLite m. SQL Berkeley DB
Database Tier l What does it do? ¡ Data Storage and Retrieval ¡ Data Aggregation and Computation ¡ Sorting ¡ Filtering ¡ ACID properties l (Atomic, Consistent, Isolated, Durable)
Database Tier l Choices ¡ How much logic to place inside the DB? ¡ Use Connection Pooling? ¡ Data Partitioning? l Spreading a dataset across multiple logical database “slices” in order to achieve better performance.
Database Tier l Hardware Requirements ¡ ¡ ¡ Entirely dependent upon application. Likely to be your most expensive machine(s). Tons of Memory Spindles galore RAID is useful (in software or hardware) l Reliability usually trumps Speed • RAID levels 0, 5, 1+0, and 5+0 are useful ¡ ¡ ¡ CPU also important Dual power supplies Dual Network
Internal Cache Tier
Internal Cache Tier l What is this? ¡ Object l What Cache Applications? ¡ Memcache ¡ Local l Lookup Tables BDB, GDBM, SQL-based ¡ Application-local Caching (eg. LRU tables) ¡ Homebrew Caching (disk or memory)
Internal Cache Tier l What does it do? ¡ Caches objects closer to the Application or Web Tiers ¡ Tuned for your application ¡ Very Fast Access ¡ Scales Horizontally
Internal Cache Tier l Hardware ¡ Lots l Requirements of Memory Note that 32 bit processes are typically limited to 2 GB of RAM ¡ Little or no disk ¡ Moderate to low CPU ¡ Fast Network
Misc. Services (DNS, Mail, etc…)
Misc. Services (DNS, Mail, etc…) l Why mention these? ¡ Every LAMP system has them ¡ Crucial but often overlooked ¡ Source of hidden problems
Misc. Services: DNS l Important Points ¡ Always have an offsite NS slave ¡ Always have an onsite NS slave ¡ Minimize l network latency Don’t use NAT, load balancers, etc…
Misc. Services: Time Synchronization l Synchronize the clocks on your systems! l Hints: ¡ Use NTPDATE at boot time to set clock ¡ Use NTPD to stay in synch ¡ Don’t ever change the clock on a running system!
Misc. Services: Monitoring l System Health Monitoring ¡ Nagios ¡ Big Brother ¡ Orcalator ¡ Ganglia l Fault Notification
The Glue • Routers • Switches • Firewalls • Load Balancers
Routers and Switches l Expensive l Complex l Crucial Piece of the System l Hints ¡ Use Gig. E if you can l Jumbo Frames are GOOD ¡ VLans to manage complexity ¡ LACP (802. 3 ad) for failover/redundancy
Load Balancers l What services to balance? ¡ HTTP Caches and Servers, App Servers, DB Slaves l What NOT to balance? ¡ DNS ¡ LDAP ¡ NIS ¡ Memcache ¡ Spread ¡ Anything with it’s own built-in balancing
Message Busses l What is out there? ¡ Spread ¡ JMS ¡ MQSeries ¡ Tibco Rendezvous l What does it do? ¡ Various forms of distributed message delivery. l Guaranteed Delivery, Broadcasting, etc… ¡ Useful for heterogeneous distributed systems
What about the OS? Operating System Selection
Lots of OS choices l Linux l Free. BSD l Net. BSD l Open. Solaris l Commercial Unix
What’s Important? l Maintainability ¡ Upgrade Path ¡ Security Updates ¡ Bug Fixes l Usability ¡ Do your engineers like it? l Cost ¡ Hardware Requirements ¡ (you don’t need a commercial Unix anymore)
Features to look for l Multi-processor l 64 bit Support Capable l Mature Thread Support l Vibrant User Community l Support for your devices
The Age of LAMP What does LAMP provide?
Scalability l Grows l Stays l Can in small steps up when it counts grow with your traffic l Room for the future
Reliability l High Quality of Service l Minimal Downtime l Stability l Redundancy l Resilience
Low Cost l Little or no software licensing costs l Minimal hardware requirements l Abundance l Reduced of talent maintenance costs
Flexible l Modular l Public l Open Components APIs Architecture ¡ Vendor ¡ Many Neutral options at all levels
Extendable l Free/Open Source Licensing ¡ Right to Use ¡ Right to Inspect ¡ Right to Improve l Plugins ¡ Some Free ¡ Some Commercial ¡ Can always customize
Free as in Beer? Price Speed Quality Pick any two.
Performance
What is Performance? l For LAMP? ¡ Improving the User Experience
Architecture affects user experience? l It affects it in two ways ¡ Speed ¡ Availability Fast Page Loads (Latency) Uptime
Problem: Concurrency l Concurrency causes slowdowns l Latency suffers l Pages load more slowly
Solution: Design for Concurrency l Build parallel systems l Eliminate l Aim bottlenecks for horizontal scalability Now for some real-world examples…
Surviving your first Slashdotting Strategies for Scalability
What is a “Slashdotting”? l Massive traffic spike (1000 x normal) l High bandwidth needed l VERY high concurrency needed l Site inaccessible to some users l If your system crashes, nobody gets in
Approach 1. Keep the system up, no crashing 1. 2. Some users are better than none Let as many users in as possible
Strategies 1. Load Balancers (of course) 2. Static File Servers 3. External Caching
Load Balancers l Hardware ¡ Software vs. Software is complex to set up, but cheaper ¡ Hardware ¡ IMHO: is expensive, but dedicated Use SW at first, graduate to HW
Static File Servers: Zero-copy l Separate Static from Dynamic ¡ l Scale them independently Later, dedicate static content serers l Modern web servers are very good at serving static content such as • HTML • CSS • Images • Zip/GZ/Tar files
External Caching l Reduces internal load l Scales horizontally l Obeys HTTP-defined rules for caching ¡ Your app defines the caching headers ¡ Behaves no differently than other proxy servers or your browser, only it’s dedicated l Hint: Use mod_expires to override
Outgrowing your First Server Strategies for Growth
Design for Horizontal Scalability l Manage Complexity l Design Stateless Systems (hard) l Identify Bottlenecks (hard) l Predict Growth l Commodity Parts
Manage Complexity l Decouple internal services ¡ Services scale independently ¡ Independent maintenance l Well-defined ¡ Facilitates APIs service decoupling ¡ Scales your engineering efforts
What is a Stateless System? l Each connection hits a new server l Server remembers nothing l Benefits? ¡ Allows Better Caching ¡ Scales Horizontally
Designing Stateless Systems l Decouple session state from web server l Store session state in a DB ¡ Careful: l Use l may just move bottleneck to DB tier a distributed internal cache Memcached ¡ Reduces pressure on session database
Example: Scaling your User DB l Assume you have a user-centric system ¡ Eg. User identity info, subscriptions, etc… 1. Group data by user Distribute users across multiple DBs Write a directory to track user->DB location Cache user data to reduce DB round trips 2. 3. 4. Disadvantage: difficult to compare two users
Identify Bottlenecks l Monitor ¡ Use l Store tools for this, there are many and plot historical data ¡ Used l l Use your system performance to identify abnormalities Check out rrdtool system tools to troubleshoot ¡ vmstat, iostat, sar, top, strace, etc…
Predict Growth l Use performance metrics ¡ Hits/sec ¡ Concurrent connections ¡ System load ¡ Total number of users ¡ Database table rows ¡ Database index size (on disk/memory) ¡…
Machine-sized Solutions l Design ¡ (it’s for last year’s hardware cheaper) l Leaves room for your software to grow ¡ Hardware will get faster ¡ And your systems will get busier
Use Commodity Parts l Standardize Hardware l Use Commodity Software (Open Source!) l Avoid Fads
THE END Thank You
- Slides: 74