Outline What is Map Reduce Where does it

  • Slides: 12
Download presentation

Outline • What is Map. Reduce ? • Where does it fix ? •

Outline • What is Map. Reduce ? • Where does it fix ? • What is its benefit ? • How does it work ? • Must be in Java ? 2

What is Map. Reduce ? Google 原生定義 Map. Reduce is a framework for computing

What is Map. Reduce ? Google 原生定義 Map. Reduce is a framework for computing certain kinds of distributable problems using a large number of computers (nodes), collectively referred to as a cluster. 3

Where does it fix ? 應用範圍 • 大規模資料集 • 可拆解 • Text tokenization •

Where does it fix ? 應用範圍 • 大規模資料集 • 可拆解 • Text tokenization • Indexing and Search • Data mining • machine learning • … http: //www. dbms 2. com/2008/08/26/known-applications-of-mapreduce/ 5

How does it work ? Map. Reduce 運作流程 input HDFS map output HDFS sort/copy

How does it work ? Map. Reduce 運作流程 input HDFS map output HDFS sort/copy merge split 0 reduce part 1 split 2 map split 3 split 4 map Job. Tracker跟 Name. Node取 得需要運算的 blocks Job. Tracker選數 個Task. Tracker來 作Map運算,產 生些中間檔案 Job. Tracker將中 間檔案整合排序 後,複製到需要 的Task. Tracker去 Job. Tracker 派遣 Task. Tracker 作reduce完後通 知Job. Tracker 與Namenode 以產生output 7

<Key, Value> Pair Input Map Output Row Data key 1 key 2 key 1

<Key, Value> Pair Input Map Output Row Data key 1 key 2 key 1 … val val … Map Select Key Input key 1 val …. val Reduce Output key values Reduce 8

概念 Map. Reduce 圖解 9

概念 Map. Reduce 圖解 9

概念 Map. Reduce in Parallel 10

概念 Map. Reduce in Parallel 10

How does it work ? 範例 I am a tiger, you are also a

How does it work ? 範例 I am a tiger, you are also a tiger map map I, 1 am, 1 a, 1 tiger, 1 you, 1 are, 1 also, 1 a, 1 tiger, 1 Job. Tracker先選了三個 Tracker做map a, 1 also, 1 am, 1 are, 1 I, 1 tiger, 1 you, 1 reduce Map結束後,hadoop進行 中間資料的整理與排序 a, 2 also, 1 am, 1 are, 1 I, 1 tiger, 2 you, 1 Job. Tracker再選兩個 Task. Tracker作reduce 11

Must be in Java ? Options without Java • 雖然Hadoop框架是用Java實作,但 Map/Reduce應用程序則不一定要用 Java 來寫 •

Must be in Java ? Options without Java • 雖然Hadoop框架是用Java實作,但 Map/Reduce應用程序則不一定要用 Java 來寫 • Hadoop Streaming : – 執行作業的 具,使用者可以用其他語言 ( 如:PHP)套用到Hadoop的mapper和reducer • Hadoop Pipes:C++ API 12