Hadoop Map Reduce Framework Mr Sriram Email hadoopsriramagmail

  • Slides: 60
Download presentation
Hadoop Map. Reduce Framework Mr. Sriram Email: hadoopsrirama@gmail. com

Hadoop Map. Reduce Framework Mr. Sriram Email: hadoopsrirama@gmail. com

Objectives v Map. Reduce Concepts v Map. Reduce Job v Map. Reduce Data Flow

Objectives v Map. Reduce Concepts v Map. Reduce Job v Map. Reduce Data Flow v Analyze different use cases where Map. Reduce is used v Differentiate between Traditional way and Map. Reduce way v Learn about Hadoop 2. X Map. Reduce architecture and components v Understand execution flow of YARN Map. Reduce application v Implement basic Map. Reduce concepts v Run a Map. Reduce Program v Understand Input splits concepts in Map. Reduce v Understand Map. Reduce Job Submission Flow v Implement Combiner and Partitioner in Map. Reduce

Map. Reduce Concepts • • • Introduction to Map Reduce Functional Programming Concepts Mapper

Map. Reduce Concepts • • • Introduction to Map Reduce Functional Programming Concepts Mapper Reducer Driver

Introduction to Map Reduce Hadoop map/Reduce is a software framework for easily writing application

Introduction to Map Reduce Hadoop map/Reduce is a software framework for easily writing application which process vast amount of data inparallel on large clusters of commodity hardware in a reliable, fault-tolerant manner. A Map/Reduce job usually splits the input data-set into independent chunks which are processed by map tasks in a completely parallel manner. The framework sorts the outputs of the maps, which are then input to the reduce tasks. Typically both the input and output of the jobs are stored in a file-system. The framework takes care of scheduling tasks, monitoring them and re-executes the failed tasks. Map/Reduce framework and HDFS are running on same set of nodes and it allows the framework to effectively schedule tasks on one node where data is already present, resulting in very high aggregate bandwidth across the cluster. This framework consists of a single master Job Tracker/ Resource Manager and one slave Task Tracker / ode Manager per cluster-node.

Functional Programming Concepts

Functional Programming Concepts

Mapper

Mapper

Reducer

Reducer

Driver

Driver

Where Map. Reduce is used?

Where Map. Reduce is used?

Traditional Way

Traditional Way

Map. Reduce Way

Map. Reduce Way

Why Map. Reduce?

Why Map. Reduce?

Solving the problem with Map. Reduce

Solving the problem with Map. Reduce

Hadoop 2. X Map. Reduce Architecture

Hadoop 2. X Map. Reduce Architecture

Hadoop 2. X Map. Reduce Components

Hadoop 2. X Map. Reduce Components

Anatomy of a Map. Reduce Program

Anatomy of a Map. Reduce Program

Map. Reduce Paradigm

Map. Reduce Paradigm

Physical Flow of Map. Reduce Program

Physical Flow of Map. Reduce Program

Physical Flow of Map. Reduce Program

Physical Flow of Map. Reduce Program

Life Cycle of Map. Reduce Job Map function Reduce function Run this program as

Life Cycle of Map. Reduce Job Map function Reduce function Run this program as a Map. Reduce job

Input Splits

Input Splits

Relation between input splits and HDFS Blocks

Relation between input splits and HDFS Blocks

Map. Reduce Job Submission Flow

Map. Reduce Job Submission Flow

Overview of Map. Reduce

Overview of Map. Reduce

Combiners

Combiners

Combiner

Combiner

Partitioner - Redirecting output from Mapper

Partitioner - Redirecting output from Mapper

Revisit – De Identification Architecture

Revisit – De Identification Architecture

Demo 1– Word Count Program Demo of Word Count Data Program

Demo 1– Word Count Program Demo of Word Count Data Program

Demo 2– Word Size Word Count Program Demo of Word Size Word Count Data

Demo 2– Word Size Word Count Program Demo of Word Size Word Count Data Program

Demo 3– Weather Data Program Demo of Weather Data Program

Demo 3– Weather Data Program Demo of Weather Data Program

Demo 4– Patent Data Program Demo of Patent Data Program

Demo 4– Patent Data Program Demo of Patent Data Program

Demo 5– Max Temp Data Program Demo of Max Temp Data Program

Demo 5– Max Temp Data Program Demo of Max Temp Data Program

Demo 6– Average Salary Program Demo of Average Salary Program

Demo 6– Average Salary Program Demo of Average Salary Program

Demo 7– De. Identify Healthcare Program Demo of De. Identify Healthcare Program

Demo 7– De. Identify Healthcare Program Demo of De. Identify Healthcare Program

Demo 8– Music Track Program Demo of Music Track Program

Demo 8– Music Track Program Demo of Music Track Program

Demo 9– Call Center Data Analytics Program Demo of Callcenter Data Analytics Program

Demo 9– Call Center Data Analytics Program Demo of Callcenter Data Analytics Program

Map. Reduce Job • • Introduction Job Submission Job Initialization Task Assignment Task Execution

Map. Reduce Job • • Introduction Job Submission Job Initialization Task Assignment Task Execution Progress and Status Updates Job Completion

Introduction

Introduction

Job Submission

Job Submission

Job initialization

Job initialization

Job Assignment

Job Assignment

Job Execution

Job Execution

Progress Measure

Progress Measure

Progress and Status Updates

Progress and Status Updates

Progress and Status Updates. .

Progress and Status Updates. .

Job Completion

Job Completion

 • Input Files Map. Reduce Data Flow • • • Input Format Input

• Input Files Map. Reduce Data Flow • • • Input Format Input Splits Record Reader Mapper Partition and Shuffle Sort Reduce Output Format Record Writer Output Files

Map. Reduce Data Flow Diagram

Map. Reduce Data Flow Diagram

Input Files

Input Files

Input Format

Input Format

Input Splits

Input Splits

Record Reader

Record Reader

Mapper

Mapper

Partition and Shuffle

Partition and Shuffle

Sort

Sort

Reduce

Reduce

Output Format

Output Format

Record Writer

Record Writer

Thank You !!!!!!

Thank You !!!!!!