CHAPTER 7 Hadoop Outline Hadoop Hadoop HDFS Hadoop
- Slides: 32
CHAPTER 7 認識Hadoop
Outline �什麼是Hadoop �Hadoop的架構 �HDFS (Hadoop Distributed File System) �HBase 2
什麼是Hadoop的架構 HDFS (Hadoop Distributed File System) HBase 3
什麼是Hadoop? �Hadoop is �一個Apache專案 �分散式計算的平台 �提供使用者簡易撰寫並執 行處理海量資料應用程式 的軟體平台。 Cloud Applications Map. Reduce Hbase Hadoop Distributed File System (HDFS) A Cluster of Machines 4
Hadoop轉折點 �Nutch後來遇到儲存大量網站資料的瓶頸 �Google在一些會議分享他們的三大關鍵技術 �SOSP 2003 : “The Google File System” �OSDI 2004 : “Map. Reduce : Simplifed Data Processing on Large Cluster” �OSDI 2006 : “Bigtable: A Distributed Storage System for Structured Data” 6
Hadoop起源(2004~Now) �Doug-Cutting 參考 Google 提出的三項技術 �先後把 Distributed File System (NDFS) 以及 Map. Reduce 實 作在 Nutch �在 2006年時,Nutch 把分散式計算 (Distributed Computing) 的部分獨立出來,稱之為Hadoop專案 �Yahoo 雇用 Doug-Cutting 建立網頁搜尋引擎 �NDFS也改名為 Hadoop Distributed File System (HDFS) 7
Google vs. Hadoop 開發團隊 Google Apache 贊助者 Google Yahoo, Amazon 資源 open document open source 程式撰寫模式 Map. Reduce Hadoop Map. Reduce 檔案系統 GFS HDFS 資料庫系統 Bigtable Hbase 搜尋引擎 Google Nutch 作業系統 Linux / GPL 9
什麼是Hadoop的架構 HDFS (Hadoop Distributed File System) HBase 10
Hadoop的架構(1/3) �Hadoop專案包含一些相關子專案 Zoo. Keeper Avro Pig Chukwa Hive Map. Reduce HBase HDFS Hadoop Core 11
什麼是Hadoop的架構 HDFS (Hadoop Distributed File System) HBase 14
HDFS檔案讀取 2: get block location Name. Node 1: open() Distributed File. System HDFS Client 6: close() 3: read() FSData Input. Stream client JVM client Node 4: read() Date. Node 5: read() Date. Node 21
HDFS檔案寫入 2: create file Name. Node 1: create() Distributed File. System 7: complete HDFS Client 6: close() 3: write() FSData Input. Stream client JVM client Node 4: write packet 5: ack packet 4 4 Date. Node 5 5 22
什麼是Hadoop的架構 HDFS (Hadoop Distributed File System) HBase 23
什麼是HBase? �HBase是一種分散式欄導向 (column-oriented) 資料庫 �可擴展的資料儲存 �在 2008年Hadoop成為Apache 的專案時,HBase也成為其 子專案之一。 Cloud Applications Map. Reduce Hbase Hadoop Distributed File System (HDFS) A Cluster of Machines 24
許多公司都在使用HBase �Adobe � 內部使用 (Structure data) �Kalooga � 圖片搜尋引擎 http: //www. kalooga. com/ �Meetup � 社群聚會網站 http: //www. meetup. com/ �Streamy � Migrate from My. SQL to Hbase http: //www. streamy. com/ �Trend Micro � 雲端掃毒架構 http: //trendmicro. com/ �Yahoo! � 儲存文件 fingerprint 避免重複 http: //www. yahoo. com/ �More - http: //wiki. apache. org/hadoop/Hbase/Powered. By 26
HBase的關鍵角色(1/2) �HMaster �負責監控HRegionserver slaves �維持HRegionserver slaves 之間的負載平衡 �當HRegion. Server 失敗時,轉移該Hregionserver 上的資料 到其它HRegion. Server。 �HRegionserver slaves �接受Client 端發送的請求 (寫/讀/瀏覽) �負責回報HRegions 狀態給HMaster �每個HRegion. Server會被分配到多個 (也可能沒有) HRegions 27
HBase的架構 29
Hbase 的Data Model 30
Example Conceptual View Physical Storage View 31
Hbase運作流程 表格 0 Region 找出. META. Region位置 找出-ROOT-位置 Region 表格 1 Region Region … … Region … Zoo. Keeper -ROOT- . META. … 找出表格 Region位置 Region 表格n Region … 32
- Hadoop hdfs latency
- What is hadoop i/o
- Hdfs on kubernetes
- Namenode hdfs
- Hive hdfs
- Fsck command in hadoop
- Hdfs chown
- Cloudera
- Hdfs full form
- Apache marathon latency
- Gfs vs hdfs
- Sandwich quotation
- Labeling theory
- Romans outline by chapter
- Research proposal
- Give me liberty chapter 27 pdf
- Methodology chapter outline
- Chapter 38 a world without borders outline
- Vbscript
- How do the gamemakers lure katniss out of the cave
- Chapter 31 societies at crossroads outline
- Ap world history chapter 28 outline
- Chapter 2 learning goals outline sociology answers
- Chapter 1 outline
- Chapter 1 outline
- Agent orange and napalm
- Chapter 2 outline
- Acts outline
- Government spending multiplier
- 24 chapter outline
- Apush chapter 16 conquering a continent outline
- Is hadoop open source
- Hadoop yarn