Dynamic Load Balancing of Virtual Web Servers Shawn

Dynamic Load Balancing of Virtual Web Servers Shawn M. Emery DLB

Introduction • Problem • Traditional Web Server • Round-Robin Design • Dynamic Design • Web Server Agent • Collector Agent • Weight Calculation 10/27/2021 • Enhanced DNS • System Configuration • Benchmarking • Lessons Learned • Future Work DLB S. Emery 2

Problem • Web server access can be slow • Web server not reliable • Solution – Use redundant web servers that may be otherwise underutilized 10/27/2021 DLB S. Emery 3

Traditional Web Server 3. Establish HTTP connection and send GET /wlb. html HTTP/1. 0 Web Client (Browser) Web Server (gandalf) 4. Transfer of data (HTML text) 1. Resolve http: //gandalf. uccs. edu/wlb. html 2. Resolved IP address 128. 198. 9. 118 Name Server (DNS) 10/27/2021 DLB S. Emery 4

Traditional Round Robin DNS Design ws n wc x wc 2 wc 1 Web Clients 10/27/2021 Naming Service (DNS) DLB ws 2 ws 1 Web Servers S. Emery 5

Dynamic DNS Design ws n wc x Naming Service (DNS) wc 2 wc 1 Web Clients 10/27/2021 ws 2 ws 1 WS Metric DB Web Servers Collector Agent Web Server Agent DLB S. Emery 6

Web Server Agent • Collects various statistics – System Statistics – Network Statistics – Web Server Statistics • Reports statistics to collector agent on the DNS machine • To start the web server agent – stat. Agent. pl <gateway> <collector> <time interval> 10/27/2021 DLB S. Emery 7

System Statistics • Web server agent collects system data – Run queue (#) – CPU idle time (%) – Pages scanned by page daemon (pages/s) • Web server agent uses – vmstat 1 2 • every 1 second collect 2 samples 10/27/2021 DLB S. Emery 8

Vmstat Output and Meaning – r - # of processes waiting to run (extent) – sr - # of pages scanned by page daemon to put back on the free list – id - % of CPU idle time 100 - (us + sy) = id (discrete) 10/27/2021 DLB S. Emery 9

Network Statistics • Web server agent collects network delay between the gateway and web server – Average round trip time to gateway (ms) • Web server agents uses – ping -s <gateway> 64 1 • sends 1 probe to the gateway machine, packet size is 64 bytes 10/27/2021 DLB S. Emery 10

Ping Output and Meaning PING gandalf. uccs. edu (128. 198. 9. 118): 56 data bytes 64 bytes from 128. 198. 9. 118: icmp_seq=0 ttl=248 time=202. 3 ms 64 bytes from 128. 198. 9. 118: icmp_seq=1 ttl=248 time=220. 0 ms . . . --- gandalf. uccs. edu ping statistics --5 packets transmitted, 5 packets received , 0% packet loss round-trip min/avg/max = 202. 3/258. 4/440. 0 ms – Ping produces an ICMP echo request the latency between the response is measured and averaged (258. 4 ms in the above example) 10/27/2021 DLB S. Emery 11

Web Server Statistics • Web server provides statistical data – Requests per second – Modern web servers, such as Apache will create spare server processes • Number of idle web server processes • Number of busy web server processes • Web server agent – Establishes HTTP connection to web server – Sends GET /server-status to web server – Reports statistics to collector agent 10/27/2021 DLB S. Emery 12

Web Server Statistics and Meaning Apache Server Status for gandalf. uccs. edu Current Time: Wed Dec 10 00: 32: 51 1997 Restart Time: Wed Dec 10 00: 32: 27 1997 Server uptime: 24 seconds Total accesses: 0 - Total Traffic: 0 k. B CPU Usage: u 0 s 0 cu 0 cs 0 0 requests/sec - 0 B/second 1 requests currently being processed, 4 idle servers. . . – Forked web server processes with no work (# of idle web server processes) – Requests per second (history) 10/27/2021 DLB S. Emery 13

Collector Agent • Gather statistics from web server agents • Calculate weight based on statistics • Write calculated weights to file for enhanced DNS server to read • To start the collector agent – gather. pl 10/27/2021 DLB S. Emery 14

Weight Calculations • Rate each web server with weight based on statistics sent from the web server agents weight of server= ((19. 68*rid) + (19. 58*rcpu) + (19. 60*rrq) + (19. 64*rrps) + (17. 24*rap) + (4. 23*rsr)) 10/27/2021 DLB S. Emery 15

Weight Calculations (Cont) 50 Web clients Test 50 k. B file size (HTML and CGI) with Web. Stone 3 Web servers Stat agent CPU idle Stat agent Through-put Gather DNS server 10/27/2021 DLB CPU idle Stat agent CPU idle S. Emery 16

Weight Calculations (Example) – CPU idle time had an average throughput of 51. 92. The sum of averages for the characteristics was 265. 18. To find the relevant percentage 51. 92/265. 18 = 0. 1958 = 19. 58% was then multiplied by the actual CPU percent idle divided by the approximate threshold (found to be 100% during the benchmarks), to get the weight: <cpu weight> = 19. 58*(<actual cpu>/100) 10/27/2021 DLB S. Emery 17

Enhanced DNS • Purpose – To provide dynamic IP addresses to clients – To make resolution of addresses transparent to clients • Implementation – Read weighted web server list if the requested name is best. hobbit – Dynamic records • round. robin. hobbit contains IP address of the next web server from a list of web servers • best. hobbit contains IP address of the best weighted web server from a web server set 10/27/2021 DLB S. Emery 18

Dynamic Load Balancing Prototype Configuration HW • Web Servers – 1 Ultra 2 (2 x 200 MHz) and 2 Sparc 5 s (110 and 70 MHz) • Web Clients – 3 Sparc 5 s (85 MHz), 1 Sparc 5 (70 MHz), 1 Sparc 4 (110 MHz), 1 Ultra 2 (2 x 200 MHz), and 1 Ultra 10 (333 MHz) • Network: – Device: Cisco 1900 switch and Sun Switch – Cards: le (lance ethernet for Sparc 4 and 5’s) and hme (highspeed Mb ethernet 10/100 for Ultra 2’s and 10’s) 10/27/2021 DLB S. Emery 19

Prototype Configuration (Cont) Clients Servers 10/100 Mb Sun Switch Up-link Cisco 1900 10 Mb 10/27/2021 DLB S. Emery 20

Dynamic Load Balancing Prototype Configuration SW • Operating System: – Servers: Solaris 2. 5. 1, 2. 7, and 2. 8 – Clients: Solaris 2. 5. 1, 2. 6, and 2. 7 • Web Servers – Apache v 1. 3. 3 • Web Benchmark – Web. Stone 2. 0 10/27/2021 DLB S. Emery 21

Single Web Server Benchmarking Work load type 10/27/2021 DLB S. Emery 22

Throughput w/ Static HTML Work-Load 10/27/2021 DLB S. Emery 23

Throughput w/ CGI Work-Load 10/27/2021 DLB S. Emery 24

Throughput w/ Mixed (HTML and CGI) Work-Load 10/27/2021 DLB S. Emery 25

Lessons Learned • Inconsistent results between benchmarks – Problem: During certain sets of benchmarks through-put was low during the times in which larger pages were selected more frequently than the smaller to mid-size pages – Solution: Restrain page sizes to the same size when comparing results • Benchmark fails during run with “host not found” – Problem: Web clients DNS query was timing out because the DNS server was too busy. The result was to fail over to another DNS server which returned NXDOMAIN – Solution: Comment out the other name servers listed in the client’s resolv. conf and reduce the frequency in which the web clients resolve through DNS S. Emery 26 10/27/2021 DLB

Lessons Learned (Cont) • Both algorithms were getting poor results on Sun boxes – Problem: By default some versions of Solaris run naming service caching daemon which caches host entries – Solution: Kill nscd or disable the host cache during benchmarking • Benchmarks would hang on Sun box – Problem: Process for gathering all web client data (webmaster) was running out of allowable open file descriptors – Solution: Increase the maximum number of open file descriptors per process • set rlim_fd_cur=128 • set rlim_fd_max=1024 10/27/2021 DLB S. Emery 27

Lessons Learned (Cont) • Initially the observed benchmark performance was slow – Problem: There was a contention for the same resource for both the web clients and web servers since they were running on the same machine initially – Solution: Separate the web clients and web server on different machines • Benchmark performance slowed down when performing tests at school as opposed to home – Problem: When testing at school a window manager (fvwm) was being used to access the computer. This consumed memory resources. At home telnets were used to access the computers – Solution: Did not run the window manager when running benchmarks at school 10/27/2021 DLB S. Emery 28

Future Work • Convert code from Perl to C for name server, web server agent, and collector agent – interpretation = increased computation time • More test machines to increase work load on servers • Get current web server queue for better estimations on work load • Read system stats directly from kernel space • Implement RFC 1794 (DNS load balancing) • Test web clients from Internet and intranet • Dynamically formulate weights based on response • Develop disk I/O measurements to detect I/O bound 10/27/2021 DLB S. Emery 29
- Slides: 29