Hadoop Namenode High Availability Requirements and Procedures August
Hadoop Namenode High Availability Requirements and Procedures August 2008
Requirements 4 Two nodes to satisfy availability requirements. 4 High availability for internal components of each node. § Disk redundancy § Network redundancy 4 Redundant network architecture. 4 Heartbeat mechanism between the two nodes. 4 Replication of namenode metadata. 4 Automatic fail over with no human action required. 2
Internal Components • Disks o o o 2 x 300 GB 15 k RPM SAS. Hardware RAID 1 mirroring. SMART monitoring. • Network o o Dual 1 Gbps on-board NICs. Linux bonding with LACP.
Redundant Network Architecture • Linux bonding o o See bonding. txt from Linux kernel docs. LACP, aka 802. 3 ad, aka mode=4. o o Must be supported by your switches. Throughput advantage o Allows for failure of either NIC instead of a single heartbeat connection via crossover. § (http: //en. wikipedia. org/wiki/Link_Aggregation_Control_Protocol) § Observed at 1. 76 Gb/s • Switching infrastructure and physical segregation. o See diagram 4
Network Diagram 5
Heartbeat Between Nodes § Provided by "heartbeat" package. § (http: //www. linux-ha. org/) § Manage multiple resources: § Virtual IP address § DRBD Disk § Hadoop processes § /etc/ha. d/haresources example: cw-grid 101. contextweb. prod IPaddr: : 10. 5. 59 cw-grid 101. contextweb. prod drbddisk: : r 0 cw-grid 101. contextweb. prod Filesystem: : /dev/drbd 0: : /hadoop: : ext 3: : defaults cw-grid 101. contextweb. prod hadoop § Heartbeat uses bond 0 network interface. (* Not approved) § 3 second timeout for "deadtime". § Created LSB compliant hadoop init script. 6
Replication of Namenode Metadata 4 DRBD Replication. 4 (http: //www. drbd. org/) 7
/etc/drbd. conf example: global { usage-count no; } resource r 0 { protocol C; syncer { rate 110 M; } # approximately 50% of total available startup { wfc-timeout 0; degr-wfc-timeout 120; } on cw-grid 101. contextweb. prod { device /dev/drbd 0; disk /dev/sda 4; address 10. 5. 60: 7788; meta-disk internal; } on cw-grid 102. contextweb. prod { device /dev/drbd 0; disk /dev/sda 4; address 10. 5. 61: 7788; meta-disk internal; } }
Fail Over Order of Events • • • Virtual IP fails over. DRBD system switches primary node. (/proc/drbd status) File system fsck and mount at /hadoop. Hadoop started via LSB compliant init script. End to end fail over time approximately 15 seconds. Optionally, original master is rebooted to help avoid split-brain.
DRBD Status 4 Updating # cat /proc/drbd version: 8. 2. 6 (api: 88/proto: 86 -88) GIT-hash: 3 e 69822 d 3 bb 4920 a 8 c 1 bfdf 7 d 647169 eba 7 d 2 eb 4 build by buildsvn@c 5 -x 8664 -build, 2008 -06 -02 10: 04: 55 0: cs: Sync. Source st: Primary/Secondary ds: Up. To. Date/Inconsistent C r-- ns: 18440304 nr: 0 dw: 27072452 dr: 18511901 al: 11746 bm: 12767 lo: 14 pe: 12 ua: 246 ap: 1 oos: 84438604 [==>. . . . ] sync'ed: 18. 0% (82459/100465)M finish: 0: 14: 31 speed: 96, 904 (77, 472) K/sec 4 Synchronized # cat /proc/drbd version: 8. 2. 6 (api: 88/proto: 86 -88) GIT-hash: 3 e 69822 d 3 bb 4920 a 8 c 1 bfdf 7 d 647169 eba 7 d 2 eb 4 build by buildsvn@c 5 -x 8664 -build, 2008 -06 -02 10: 04: 55 0: cs: Connected st: Primary/Secondary ds: Up. To. Date/Up. To. Date C r-- ns: 102901512 nr: 0 dw: 27140024 dr: 102898169 al: 11781 bm: 17923 lo: 0 pe: 0 ua: 0 ap: 0 oos: 0 10
- Slides: 10