LAIO Lazy Asynchronous IO For Event Driven Servers

  • Slides: 28
Download presentation
LAIO: Lazy Asynchronous I/O For Event Driven Servers Khaled Elmeleegy Alan L. Cox

LAIO: Lazy Asynchronous I/O For Event Driven Servers Khaled Elmeleegy Alan L. Cox

Outline n n n Available I/O APIs and their shortcomings. Event driven programming and

Outline n n n Available I/O APIs and their shortcomings. Event driven programming and its challenges. Lazy Asynchronous I/O (LAIO). Experiments and results. Conclusions.

Key Idea n n Existing I/O APIs come short of event driven server needs.

Key Idea n n Existing I/O APIs come short of event driven server needs. LAIO fixes that.

Non-Blocking I/O n System call may return without fully completing the operation. n n

Non-Blocking I/O n System call may return without fully completing the operation. n n n Ex: write to a socket. System call may also return with completion. Disadvantages: Not available for disk operations. n Program using it needs to maintain state. n

Asynchronous I/O (AIO) n n System call returns immediately. Operation always runs to completion

Asynchronous I/O (AIO) n n System call returns immediately. Operation always runs to completion and sends notification on completion. n n Via signal, event or polling. Disadvantages n n n Missing disk operations like open and stat. Always receive completion via a notification even if the operation didn’t block. Lower performance.

Event Driven Programming with I/O (What we have) event_loop(. . ) { … while(true)

Event Driven Programming with I/O (What we have) event_loop(. . ) { … while(true) { event_list= get available events; for each event ev in event_list do call handler of ev; } } handler(…) { … /* do stuff 1 */ open(. . ); /*may block*/ … /* do stuff 2 */ return; /* to event_loop */ }

Event Driven Programming with I/O (What we have) event_loop(. . ) { … while(true)

Event Driven Programming with I/O (What we have) event_loop(. . ) { … while(true) { event_list= get available events; for each event ev in event_list do call handler of ev; } } If Blocks Server Stalls handler(…) { … /* do stuff 1 */ open(. . ); /*may block*/ … /* do stuff 2 */ return; /* to event_loop */ }

Event Driven Programming with I/O (What we want) event_loop(. . ) { … while(true)

Event Driven Programming with I/O (What we want) event_loop(. . ) { … while(true) { event_list= get available events; for each event ev in event_list do call event_handler of ev; } } handler 1(…) { … /* do stuff 1 */ open(. . ); /*may block*/ if open blocks { set handler 2 as callback for open; return; /* to event_loop */ } … /* do stuff 2 */ return; /* to event_loop */ }

Event Driven Programming with I/O (What we want) event_loop(. . ) { … while(true)

Event Driven Programming with I/O (What we want) event_loop(. . ) { … while(true) { event_list= get available events; for each event ev in event_list do call event_handler of ev; } } handler 1(…) { … /* do stuff 1 */ open(. . ); /*may block*/ if open blocks { set handler 2 as callback for open; return; /* to event_loop */ } … /* do stuff 2 */ return; /* to event_loop */ }

Event Driven Programming with I/O (What we want) event_loop(. . ) { … while(true)

Event Driven Programming with I/O (What we want) event_loop(. . ) { … while(true) { event_list= get available events; for each event ev in event_list do call event_handler of ev; } } handler 1(…) { … /* do stuff 1 */ open(. . ); /*may block*/ if open blocks { set handler 2 as callback for open; return; /* to event_loop */ } … /* do stuff 2 */ return; /* to event_loop */ } handler 2(…) { … /* do stuff 2 */ return; /* to event_loop */ }

Lazy Asynchronous I/O (LAIO) n n n Like AIO on blocking: asynchronous completion notification.

Lazy Asynchronous I/O (LAIO) n n n Like AIO on blocking: asynchronous completion notification. Also like AIO operations are done in one shot and no partial completions. Similar to non-blocking I/O if operations completes without blocking. Scheduler activation based. Scheduler activation is an upcall delivered by kernel when a thread blocks or unblocks.

LAIO API Function Name Description int laio_syscall (int number, …) Performs the specified syscall

LAIO API Function Name Description int laio_syscall (int number, …) Performs the specified syscall asynchronously. void* laio_gethandle (void) Returns a handle to the last laio operation. laio_list laio_poll (void) Returns a list of handles to completed laio operations.

laio_syscall(int number, …) Invoked via kernel upcall • Enable upcalls. • Save context •

laio_syscall(int number, …) Invoked via kernel upcall • Enable upcalls. • Save context • Invoke system call System call blocks? Yes No • Disable upcalls • Return retval • errno = EINPROGRESS • Return -1 upcall_handler(. . ) {. . . Steals old stack using stored context }

Experiments and Experimental setup. n n Performance evaluated using both microbenchmarks and event driven

Experiments and Experimental setup. n n Performance evaluated using both microbenchmarks and event driven web servers (thttpd and Flash). Pentium Xeon 2. 4 GZ with 2 GB RAM machines. Free. BSD-5 with KSE, Free. BSD’s scheduler activation implementation. Two web traces, Rice and Berkeley, with working set sizes 1. 1 GB and 6. 4 GB respectively.

Micro-benchmarks n Read a byte from a pipe 100, 000 times two cases blocking

Micro-benchmarks n Read a byte from a pipe 100, 000 times two cases blocking and non-blocking: n For non-blocking (byte ready on pipe) n n n For blocking (byte not ready on pipe) n n LAIO is 320% faster than AIO. LAIO is 40% slower than non-blocking I/O. AIO is 8% faster than LAIO. Call getpid(2) 1, 000 times in two cases KSE enabled and disabled. n When disabled program was 5% faster (KSE overhead)

thttpd Experiments n n n thttpd is an event driven server modified to use

thttpd Experiments n n n thttpd is an event driven server modified to use libevent an event notification library. Two versions of thttpd, libevent-thttpd and LAIO-thttpd. For LAIO-thttpd, thttpd was modified by breaking up event handlers around blocking operations like open.

thttpd Results (Berkeley Throughput)

thttpd Results (Berkeley Throughput)

thttpd Results (Berkeley Response Time)

thttpd Results (Berkeley Response Time)

thttpd Results (Rice Throughput)

thttpd Results (Rice Throughput)

thttpd Results (Rice Response Time)

thttpd Results (Rice Response Time)

thttpd Results (Rice Throughput 512 MB RAM)

thttpd Results (Rice Throughput 512 MB RAM)

thttpd Results (Rice Response Time 512 MB RAM)

thttpd Results (Rice Response Time 512 MB RAM)

Flash n n An event driven web server. 3 flavors: n n Pure event

Flash n n An event driven web server. 3 flavors: n n Pure event driven. AMPED: Asymmetric Multiprocess Event Driven. n n n Event driven core. Potentially blocking I/O handed off to a helper process. Helper does an explicit read to bring data in memory. LAIO: uses LAIO to do all I/O asynchronously. For each of the three flavors files are sent either with sendfile(2), or using mmap(2).

Flash Experiments n n n All experiments are done with 500 clients. All sockets

Flash Experiments n n n All experiments are done with 500 clients. All sockets are blocking. For mmap: File maped to memory, then written to socket. n n n Page faults may happen. mincore(2) is used to check if pages are in memory. For sendfile: File is sent via the sendfile(2) syscall which may block. n Optimized sendfile: Kernel is modified that sendfile returns if blocking on disk occurs.

Flash Throughput (mmap) Configuration Flash-event FLASH-AMPED FLASH-LAIO (mmap) Rice-Cold 203 Mbps 386 Mbps 299

Flash Throughput (mmap) Configuration Flash-event FLASH-AMPED FLASH-LAIO (mmap) Rice-Cold 203 Mbps 386 Mbps 299 Mbps Rice-Warm 830 Mbps 800 Mbps 797 Mbps Berkeley-Cold 81 Mbps 134 Mbps 132 Mbps Berkeley-Warm 78 Mbps 127 Mbps 131 Mbps n For Rice-Cold: 41072 callouts to the helper process for AMPED. For LAIO 46486 page faults. n Performance difference is due to prefetching.

Flash Throughput (sendfile) Configuration Flash-event FLASH-AMPED FLASH-LAIO (sendfile) Rice-Cold 277 Mbps 398 Mbps 382

Flash Throughput (sendfile) Configuration Flash-event FLASH-AMPED FLASH-LAIO (sendfile) Rice-Cold 277 Mbps 398 Mbps 382 Mbps Rice-Warm 845 Mbps 843 Mbps 815 Mbps Berkeley-Cold 122 Mbps 171 Mbps Berkeley-Warm 125 Mbps 180 Mbps 179 Mbps

Conclusions n n n LAIO subdues shortcomings of other I/O APIs. LAIO is more

Conclusions n n n LAIO subdues shortcomings of other I/O APIs. LAIO is more than 3 times faster than AIO when data is in memory. LAIO serves well event driven servers. LAIO increased thttpd throughput by 38%. LAIO matched Flash performance with no kernel modifications.

Questions?

Questions?