Introduction • • • Special Assumptions Consistency Model System Design System Interactions Fault Tolerance (Results)
Assumptions • The system will always be broken • Files are BIG • Large streaming reads / small random reads • Large sequential writes (appends) • Lots of multiple appends • High sustained bandwidth
Consistency Model • Consistent: All readers see same thing. • Defined: You see exactly what you write. • Undefined: Consistent, but might not be exactly as expected.
How do Apps Deal? • Parts of files are inconsistent • Must do some checking of data: – Application level checksums
Single Master Architecture • Good: – Has global knowledge • Can make intelligent placement/replication decisions. • Bad: – Becomes a bottleneck • Must limit it’s involvement in read/write
Architecture • Master – Keeps track of everything • Chunk Servers – Where the data lives – Each chunk is 64 MB • On other file systems ~8 KB
Let The Master Rule • • Namespace Locking Replica placement Creation (Garbage Collection)
Metadata • In Memory – Fast – Limited space • Chunk Locations – No persistent record • Op Log – Every change to metadata
System Interactions • To write: – – – Ask master for chunk locations (cache) Push data to all chunks (to a buffer) Send write request to primary Primary writes changes (in order received) Primary forwards to secondaries (in order received) – Secondaries write changes, confirm.
Record Append • Atomic • Allows for multiple writers • May cause inconsistent states between successful appends.
Fault Tolerance • Restore state fast • Copies, Copies • Checksums for data integrity
Results summary • When you build a file system around the specific applications which use the system, it works well.