An End-to-End Approach to Scalable Network Storage Micah Beck, Associate Professor Director, Logistical Computing & Internetworking (Lo. CI) Lab Terry Moore, Associate Director James S. Plank, Associate Professor & Director Computer Science Department SIGCOMM 2002, Pittsburgh August 23, 2002
A Generalized Communication Scenario » A quantum of data originates • from a node N • at time t » and either does or does not arrive • at a destination at a node N • at time t » and if it does arrive it may be corrupted.
Scenario: Networking » Characteristics • N and N can be members of a globally scalable network • t -t is a delay we seek in general to minimize fairly » Fits the characteristics of layers 1 through 3 of the Internet stack. » When delivering data in a network, one cannot count on : low delay, high probability of correct delivery
Scenario: Storage » Characteristics • N and N are identical or part of a non-scalable network • There is no a priori bound on t -t » Fits the characteristics of directly connected or closely coupled storage » When storing data in a closely coupled network, we count on: low delay, high probability of correct delivery
The End-to-End Approach to Networking » No reliance on the timely or accurate delivery of any particular quantum of data » High delay and corruption must only be of sufficiently low probably » Fairness between competing network participants. » This allows a high degree of autonomy and faulty behavior in the operation of the network » Scalability! 12
End-to-End is Unnecessary for Closely Coupled Storage » If a storage device can be relied on to operate with • predictable delay • high accuracy and • high availability » Then it can be used without the burden of implementing layered end-to-end services. » But the assumption of reliability can impose a cost when the assumption fails to hold true and the resource fails.
Scenario: Scalable Network Storage » Characteristics • N and N can be members of a globally scalable network • There is no a priori bound on t -t » Fits the characteristics of storage accessed over a globally scalable Internet » When storing data in a network, one cannot count on: low delay, high probability of correct delivery
Scalable Network Services Are Like the Network Itself » » » Intermittently inaccessible Vulnerable to partition Prone to corruption in transit Unpredictable latencies/jitter End-to-End: Never require a network service to be bigger, better or more complex than wide area access allows
An End-to-End Approach to Storage » No reliance on the timely or accurate delivery of any particular packet » High delay and corruption must only be of sufficiently low probably » Fairness between competing network participants. » This allows a high degree of autonomy and faulty behavior in the operation of the network » Scalability!
Internet Backplane Protocol (IBP) allocate! Na depot capability store! Nw data depot load! Nr depot
Allocation Attributes » Duration ( permanent) » Hard vs. Soft » Read/Write semantics: • Linear Append (write to end) • Linear Truncate (write to start) • Circular FIFO (with interlock) • Circular Queue (no interlock) » Depots implemented using disk and RAM • same API and semantics • performance differs
Internet Backplane Protocol (IBP) » Depots (servers) that make allocation of primitive “byte arrays” available to clients » A depot is implemented as a daemon, protocol is RPC over TCP » Byte arrays are not blocks (more abstract) • Network capabilities (primitive security) • Variable extents » Byte arrays are not files (weaker semantics) • Size & duration are limited • “Volatile” allocations • Best effort reliability and availability • No directory structure, accounting • No caching, replication
Building on IBP » Many applications assume file semantics • Unbounded size & duration • High reliability & availability • Caching & replication » In a layered architecture, these are implemented through aggregation and additional intelligence at the next level » Resource discovery: Logistical Backbone • Directory of depots, active probing • Client library
The Network Storage Stack • Our adaption of the network stack architecture for storage • Like the IP Stack Applications Logistical File System Logistical Tools L-Bone • Each level encapsulates details from the lower levels, while still exposing details to higher levels ex. Node IBP Local Access Physical
Ex. Node vs inode IBP Allocations the network local system capabilities ex. Node inode user kernel block addresses disk blocks
IBP-Mail: SMTP attachments by reference » The Problem: How to attach huge files? 1. Store the file on an IBP depot 2. Send capability with the mail message. 3. The receiver gets the file from the depot. » Future work: Asyncrhonous routing
IBP Mail SMTP sender ex. Node receiver IBP write IBP read IBP copy
Logistical Networking Application Areas » » » Source routing Bandwidth adaption Reducing (BW delay) Reliable multicast Content Distribution Remote access to structured data » Managing computation state » Temporary storage » Very large data sets » Multimedia » Collaborative computing & visualization
Software & Infrastructure » » » Tools open source, multiplatform IBP Depot (server) and C client library ex. Node and end-to-end services library Logistical Backbone server (LDAP-based) Linux/C is primary development platform • Java clients are under development » Command-line utilities, GUI » Public L-Bone deployment • Currently 1. 6 TB in North America and Europe » http: //loci. cs. utk. edu
Lbone + ex. Node + GUI: Download
Logistical Networking is a TCP » Storage is a fundamental element of communication » The end-to-end approach can apply to services other than data transmission » Logistical Networking achieves the benefits of adherence to end-to-end principles: • Application autonomy, network transparency • Aggressive innovation » Logistical Networking is a Transformative Community Project
Some Further Thoughts (it’s a position paper, after all)
IBP depots vs. IP routers » IBP enables an intermediate node in a scalable network to implement high-performance storage » What about putting storage on IP routers? • That other E 2 E principle tells us not to add functionality to the network in order to serve particular applications • Current IP applications have no use for storage at intermediate nodes » This would interfere with the IP fast path in order to support a subset of applications…
… On The Other Hand » IP datagrams are stored, then forwarded » Every router implements substantial RAM buffers » The management of this storage is highly specialized: • Limited size allocation (MTU) • Fast forwarding (FIFO, fair queuing, pipelining) » This specialization of buffer management supports interactive & near-real time applications » Hypothesis: a requirement of fast forwarding at IP intermediate nodes violates end-to-end
Are We In For a Tussle? » A intermediate node with storage can support an “MTU” the size of its maximum allocation: O(1 GB) • IPv 6 has a 32 -bit datagram size field » Low latency forwarding may be incompatible with such a monstrous MTU at intermediate nodes. » A network that abandoned low-latency forwarding as a requirement would be more truly “best effort” and would allow greater autonomy, generality. » Asynchronous applications are important! » Does the need to support interactive applications limit the scalability of the Internet?
Let’s not give up on end-to-end until we’ve really given it a try!