HBase OUTLINE Basic Data Model Implementation Architecture of
HBase
OUTLINE • Basic • Data Model • Implementation – Architecture of HDFS • Hbase Server • HRegion. Server 2
Basic • HBase directly uses or subclasses the parent Hadoop implementation
Basic Linux 4
Basic • Data. Base of problem: – Growth of Data – Complexity of install and maintain Solution : Relational Data. Base Management System(RDBMS) • Multi-RDBMS of problem: (for nodes ) – JOIN – not effective – rebalance Solution : No. SQL Data. Base
Basic • No. SQL Data. Base : – Distributed – Scalability – Easy to use (EX: put, get , alter etc. )
Basic • List of No. SQL: – Open. Source • �HBase (Yahoo!) • �Cassandra (Facebook) • Simple. DB (Amazon) – Commercial • Big. Table (Google)
Basic • Hbase: – Hadoop’s Data. Base. – Reversion of 0. 20. 6 released – Usage with Map/Reduce
OUTLINE • Basic • Data Model • Implementation – Architecture of HDFS • Hbase Server • HRegion. Server 9
Row-Oriented Data Model Emp. Id Lastname Firstname Salary 10 Smith Joe 40000 12 Jones Mary 50000 11 Johnson Cathy 44000 22 Jones Bob 55000 001: 10, Smith, Joe, 40000; 002: 12, Jones, Mary, 50000; 003: 11, Johnson, Cathy, 44000; 004: 22, Jones, Bob, 55000;
Row-Oriented Data Model Emp. Id Lastname Firstname Salary 10 Smith Joe 40000 12 Jones Mary 50000 11 Johnson Cathy 44000 22 Jones Bob 55000 To improve the performance of these sorts of operations, most DBMS's support the use of database indexes, which store all the values from a set of columns along with pointers back into the original rowid. 001: 40000; 002: 50000; 003: 44000; 004: 55000;
Column-Oriented Model Emp. Id Lastname Firstname Salary 10 Smith Joe 40000 12 Jones Mary 50000 11 Johnson Cathy 44000 22 Jones Bob 55000 10: 001, 12: 002, 11: 003, 22: 004; Smith: 001, Jones: 002, Johnson: 003, Jones: 004; Joe: 001, Mary: 002, Cathy: 003, Bob: 004; 40000: 001, 50000: 002, 44000: 003, 55000: 004; In this layout, any one of the columns more closely matches the structure of an index in a row-based system.
Table • member : Row , Column, Time. Stamp Row key Time Stamp t 3 “com. yahoo. news. tw” t 2 t 1 “com. cnn. www” t 1 Column”Contents” “我研發水下6千公尺機器人” “蚊子怎麼搜尋人肉” “… Wang 40…” “用腦波「發聲」 ”
• Add column Row key Time Stamp t 3 “com. yahoo. news. tw” t 2 t 1 “com. cnn. w ww” t 1 Table • Add< Family, Label> ”Contents” “我研發水下6千公尺 機器人” “蚊子怎麼搜尋人肉” “… Wang 40…” “用腦波「發聲」 ” ”Anchor”
Table Row key Time Stamep ‘’Anchor_tech’’ ”Contents” Eric ‘’ Anchor’’ “Anchor: sports” t 4 “com. cnn. ww w” Silva “Anchor: tech” t 5 “com. yahoo. ne ws. tw” ‘’ Anchor_sports’’ t 3 “我研發水下6千公尺機器 人” t 2 “蚊子怎麼搜尋人肉” t 1 “… Wang 40…” t 1 “用腦波「發聲」 ” “Silvia” “Eric”
Region Express: Region(start row key, end row key>& identifier Row key Region 1(com. yahoo. ne w. tw, com. def. www>, I D “com. ya r hoo. new s. tw” e g i o “com. cn n n. www” 1 “com. ab c. www” r e g i o n 2 “com. de f. www” Time Stamp ”Contents” ‘’ Anchor’’ “Anchor: tech” t 5 t 4 “Anchor: sports” t 3 “我研發水下6千公 尺機器人” t 2 “蚊子怎麼搜尋人肉” t 1 “… Wang 40…” t 1 “用腦波「發聲」 ” “Silvia” “Eric”
OUTLINE • Basic • Data Model • Implementation – Architecture of Hbase • Hbase Server • HRegion. Server 17
Architecture of Hbase Zoo. Keeper HDFS Client HM HR HR NN DN DN HR DN Cluster NN: Name. Node DN: Data. Node HM: Hmaster HR: HRegion
rebalance • a single host grows the regions – it split a row into two new regions of approximately equal size. • Until not across threshold • automatic
Hbase Master • Managing the insert, delete, modify, query operations to Tables. • Managing the load balance among regionservers. • Assigning a new regionserver for storing the region data after a region split. • Migarating the region data of a failed regionserver to another regionserver
Region. Server • carry zero or more regions • client read/write/scan requests – Random access • Automatic split regions • Send Heart. Beat to Master
HBase Operation Hbase has two speical tables: Root and. Meta Zookeeper record the location of root table useregion. META. -ROOT-
HBase Operation Zoo. Keeper HBase Client Re q st e u Step 1. R R NN DN DN R R DN DN Step 3. Use con Step 2 HM sult • Read Requests - Step 1. location of -ROOT- r re gio R n DN ROOT META - Step 2. location of the. META. Region - Step 3. user region space Cluster NN: Name. Node DN: Data. Node HM: Hmaster HR: Regionsever
HBase Operation HBase Client Zoo. Keeper HM R NN DN DN R R DN DN n. Server R egio Interacts with R DN R • Read Requests -clients cache save information of ROOT , META and User Region Cluster NN: Name. Node DN: Data. Node HM: Hmaster HR: Regionsever
HBase Client HBase in operation Interacts with Region. Server • table Region server of state HLog Region Serser Region Hstore HFile Hfile Mem Store
HBase Client HBase in operation HLog Region. Server • Client request to save data in table Region Serser Region Hstore HFile Hfile Mem Store
Hbase of characteristic • • Fault tolerance Batch processing Automatic partitioning Scale linearly with new nodes
- Slides: 28