IO Multiplexing Mechanisms Chris Gill Son Dinh Brian

I/O Multiplexing Mechanisms Chris Gill, Son Dinh, Brian Kocoloski CSE 522 S - Advanced Operating Systems Washington University in St. Louis, MO 63130 1

Handling I/O from Multiple Sources A single process can manage IPC connections with multiple other processes – Each process accepts/reads/writes asynchronously – Blocking calls are more efficient than spinning – When to accept/read/write which IPC endpoints? Process B listen Process A Process C socket FIFO CSE 522 S – Advanced Operating Systems 2

What About Multi-Threading? A thread-per-connection concurrency architecture may help manage simpler cases – E. g. , spawn a thread for each new connection – But, a large number of threads increases overhead – Also, would like uniform, consistent event handling task_struct ptr* 0 x 1 ptr* 0 x 2 ptr* 0 x 3 Process A socket Process B task_struct ptr* 0 x 1 ptr* 0 x 2 ptr* 0 x 3 listen task_struct ptr* 0 x 1 ptr* 0 x 2 ptr* 0 x 3 Process C FIFO CSE 522 S – Advanced Operating Systems 3

The select() System Call Record file descriptors in fd_set bitmasks – One for each kind of event: read, write, exception – Pass a mutable copy of each into calls to select() – Helper macros like FD_CLR() manipulate them Select returns when some endpoints are ready or when a specified time interval has gone by – – Modifies bitmasks: only enabled endpoints marked Iterate through using FD_ISSET() to find which Scalability bottleneck with more endpoints Each marked endpoint can make a single nonblocking call to appropriate function (read/write) CSE 522 S – Advanced Operating Systems 4

select() example CSE 522 S – Advanced Operating Systems 5

select() example Max file descriptor +1 CSE 522 S – Advanced Operating Systems 6

select() Limitations while (1) { int new_fd = accept(…); /* * create pipe, fork child(), child * handles new connection */ /* wait on new_fd for connection close, or * pipefd, for data from child, or child to * die */ int max_fd = get_max_fd(); Now, assume this happens select(max_fd+1, …) hundreds or thousands // … of times } (e. g. , on a webserver) CSE 522 S – Advanced Operating Systems 7

select() Limitations (LSP pp. 61) • Must calculate and pass in the highest-numbered file descriptor – not trivial to maintain a tight upper bound on • Even with a tight upper-bound, performs poorly on high-numbered file descriptors • Statically-sized fd set – either too small for many fds, or too large and this not efficient • Fd sets are mutable – the kernel can modify them, meaning they must be reinitialized before each subsequent call to select() CSE 522 S – Advanced Operating Systems 8

poll() example # of file descriptors (fds can be dynamically sized, though it is static in this example) revents (OUT: set by kernel) vs events (IN: set by user) CSE 522 S – Advanced Operating Systems 9

The poll() System Call Similar to select() but with improvements – Combines multiple bitmasks into one – Slightly more efficient for bi-directional IPC – The revents bitmask gives info per endpoint • E. g. , read and write for priority or normal data, etc. • No need to re-initialize events set – it is immutable Still a “level-sensitive” approach – May need to check each endpoint each time – But, still likely to be more efficient than select() on average CSE 522 S – Advanced Operating Systems 10

Using Data Structures to Record State I/O multiplexing requires careful accounting – Keep track of open connection endpoints – Associate fds with some program context e. g. , what we would like: IPC handles 0 x 100 0 x 300 0 x. D 00 buffered data the slithy t’was brillig and e. g. , Ptr to (struct client *) CSE 522 S – Advanced Operating Systems 11

Using Data Structures to Record State What we get: • lots of unready file descriptors • Just an FD is marked as ready --- we need to store its state somewhere IPC handles 0 x 100 0 x 400 0 x 200 0 x 300 0 x 500 0 x 800 0 x. D 00 notification X X Lookup table (e. g. , hashtable) Program state x CSE 522 S – Advanced Operating Systems 12

The epoll() System Call The event polling mechanism offers an alternative event model – Adds “edge sensitive” model triggered by events – Interface is somewhat more complex – But mainly encapsulates inherent complexity Fundamental problem in both select() & poll(): lack of scalability as whole event set must be traversed by kernel epoll() solves this: – Can cluster file descriptors into epoll contexts – Helps avoid complexities • Having to traverse all file descriptors every time • Having to associate data with a file descriptor CSE 522 S – Advanced Operating Systems 13
- Slides: 13