The Google File System Why Google has lots

Why? Google has lots of data – Cannot fit in traditional file system –

Assumptions Design assumptions – System built from many inexpensive parts – Stores a “modest”

Interface Extended typical file system interface – Normal operations • Create, delete, open, close,

Architecture GFS consists of a number of File System Clusters – Cluster includes one

How it works Client asks master for chunk servers to contact – Cache this

Fault Tolerance Fast Recovery – Master and chunk servers do not distinguish between normal

Fault Tolerance Experience – When a chunk server dies its chunks are underreplicated –

Summary Context – Google has lots of data – Hardware fails – Most files

Slides: 9

Download presentation

The Google File System

Why? Google has lots of data – Cannot fit in traditional file system – Spans hundreds (thousands) of servers connected to (tens of) thousands of disk drives Hardware fails – Need to recover rapidly – Downtime is not acceptable Most files are read or append only – Do not need to optimize for random access writes Need a distributed file system capable of storing lots of huge files that works with commonplace hardware failures

Assumptions Design assumptions – System built from many inexpensive parts – Stores a “modest” number of files (a few million) – Each file is typically 100 MB or larger – Expect large streaming reads and small random reads and large sequential writes – Need to support multiple machines concurrently appending to a single file – High bandwidth more important than low latency

Interface Extended typical file system interface – Normal operations • Create, delete, open, close, read, write – Extensions • Snapshot: efficiently make a copy of a file or directory tree • Record append: allows multiple clients to append to the same file

Architecture GFS consists of a number of File System Clusters – Cluster includes one master server and several chunk servers – Files divided into fixed-size chunks • 64 bit globally unique chunk handle • Default of 3 replicas of each chunk – Master server has metadata (namespace, access control, mapping of files to chunks, location of chunks) – Master performs garbage collection of chunks, chunk migration between servers – Master periodically communicates a heartbeat to each chunk server

How it works Client asks master for chunk servers to contact – Cache this information for a limited time – Directly interact with chunk servers during that time – Example: a read operation • Client asks master for filename and chunk index • Master responds with chunk handle and location of replicas • Client caches this information and locates one of the replicas (likely the closest) – Further reads in a chunk require no interaction with master unless cached information times out – Clients can ask for multiple chunks at once, thus limiting communication with master. Chunk size – 64 MB chunks

Fault Tolerance Fast Recovery – Master and chunk servers do not distinguish between normal and abnormal shutdown • Assumes “kill -9” is a common operation • Servers can restart in seconds – Chunks replicated on different chunk servers on different physical racks – Master state is replicated on other machine – If master fails • It restarts immediately • If not (hardware error), then another master takes over

Fault Tolerance Experience – When a chunk server dies its chunks are underreplicated – Killed a chunk server with 15, 000 chunks containing 600 GB of data • All chunks were restored in 23. 2 minutes – Killed two (duplicate) chunk servers each with 16, 000 chunks and 660 GB of data • Since data was down to one copy, replication was high priority • All chunks had at least 2 copies in 2 minutes

Summary Context – Google has lots of data – Hardware fails – Most files are read or append only Google File System – Each file is typically 100 MB or larger – GFS consists of a number of File System Clusters • Cluster includes one master server and several chunk servers Designed for hardware and software errors – File system process expects to be killed – Replication built into file system