IOLite A Unified Buffering and Caching System By
IO-Lite: A Unified Buffering and Caching System By Pai, Druschel, and Zwaenepoel (1999) Presented by Justin Kliger for CS 780: Advanced Techniques in Caching Professor Zhang (Summer 2005)
Outline Problem & Significance Literature Review Proposed Solution Design, Implementation, & Operation Experimental Design Results Conclusion Further Research
The Problem The I/O subsystem and various applications all tend to use their own private I/O buffers Redundant data copying Multiple buffering Lack of cross-subsystem optimization
Problem’s Significance Wastes memory • High CPU overhead Reduces space available for caching Causes higher cache miss rates Limits server throughput
Literature Review 1. POSIX I/O -Problem: double-buffering 2. Memory-mapped files (mmap) -Problem: Not generalized to network I/O
Literature Review 3. Transparent Copy Avoidance -Problem: VM page alignment problems Copy-on-write faults n Genie (emulated copy) Lack of full transparency leads to same problems 4. Copy Avoidance with Handoff Semantics -Problem:
Literature Review 5. Fast buffers (fbufs) Designed by Druschel -Problem: Does not support filesystem access, or a file cache 6. Extensible kernels -Problem: More overhead, not OS-portable
IO-Lite Solution Unified buffering and caching n n Allow all applications and subsystems share the same buffered I/O data Very simple at face value, very complex to implement
Basic Design Immutable buffers n n Initial allocated data cannot be modified Effectively read-only sharing Advantages? n Eliminates synchronization and protection problems Disadvantages? n I/O data cannot be modified in place
Further Design Considerations To make up for immutable buffers: n Create buffer aggregate abstraction (an ADT) mutable Reference to IO-Lite Window in VM Aggregates contain ordered list of form <address, length> n n Aggregates passed by value Buffers passed by reference
Further Design Considerations Buffer sharing must be concurrent n To achieve this, use similar method to fbufs Expand to include the filesystem Adapts for general purpose OS n Worst case scenario (in terms of overhead): Page remapping n (when last buffer is allocated before first is deallocated)
IO-Lite Implementation New read & write API which supersedes the regular read & write n n size_t IOL_read(int fd, IOL_Agg **aggr, size_t size); size_t IOL_write(int fd, IOL_Agg *aggr); IOL_Agg is buffer aggregate data type Both operations are atomic
IO-Lite Implementation Applications: n Recommends implementation in runtime I/O Libraries to avoid modifying all programs Filesystem: n File cache data structure: <file-id, offset, length> Network: n Need to modify network device drivers to allow early demultiplexing (using a packet filter)
IO-Lite Operation With regards to the cache: n Cache replacement basically LRU Allows for application customization n Cache eviction controlled by VM daemon Do >½ replaced pages contain I/O data?
IO-Lite Operation Impact of immutable buffers: n Case 1: Entire object is modified Lack of in-place modification has no ill effect n Case 2: Subset of object needs to be modified Rather than recopy entire object, use chaining Performance loss is small if blocks are localized n Case 3: Scattered subset needs modification IO-Lite incorporates mmap interface for this
Experimental Design Compared: n Apache 1. 3. 1 Widely used web server n Flash (event-driven HTTP server) Designed by authors in previous year n Flash-Lite (Flash modified to use IO-Lite API) New design by authors
Experimental Design 1. General: varied requested file size n n 40 requests for same file File size ranged from 500 bytes – 200 Kbytes 2. Persistent connections n Reduces overhead 3. CGI n Additional I/O traditionally slows servers
Experimental Design 4. Real workloads n n Shows performance benefits by allowing more space for caching Based on Rice’s CSCI department logs 5. Wide Area Network (WAN) n Test throughput with 0 -256 slow clients connecting 6. Applications n Incorporated API into UNIX programs
Results 1. General test: n n Bandwidth increase of 43% over Flash, 137% over Apache No real difference for files less than 5 KBytes
Results 2. Persistent Connections n Flash-Lite even more effective at smaller file sizes 3. CGI n All servers slow, but Flash-Lite still much better 4. Real workload n Flash-Lite throughput 65% greater than Apache 5. WAN n Flash-Lite does not suffer from slow clients 6. Applications n Varied improvement for all programs tested
Conclusion IO-Lite consistently improved performance in all contexts tested Requires modification to numerous libraries and network device drivers n EG: see Peng, Sharma, & Chiueh (2003)
Further Research There have been 42 citations n Almost all fell between 2001 -2003 Authors have not written any follow-ups Lack of papers that involve implementation of IO-Lite or a variation of it n Probably because of complexity and number of modifications that are necessary
Appendix: Figures 2) 3) 4) 5) 6)
Questions?
- Slides: 24