Pig Mr Sriram Email hadoopsriramagmail com Objectives v
Pig Mr. Sriram Email: hadoopsrirama@gmail. com
Objectives v Pig Introduction v Pig Installation v Pig Data Model v Pig Latin
Pig Introduction • • • • Need of Pig Why should i go for pig when there is Map Reduce Why Pig? Where not to use Pig? What is Pig? Use Cases where Pig is used? Conceptual Data Flow How Yahoo uses Pig? Pig Basic Program Structure Case Sensitivity Pig – Running Modes Shell and Utility Commands Pig Components Pig Execution Pig – Latin Program
Need of Pig
Why should i go for pig when there is Map Reduce?
Why should i go for pig when there is Map Reduce?
Why Pig?
Where should i use Pig? Case 1 - Time Sensitive Data (Census, Population, Election) Case 3 - Sampling Data (Data Visualization for Analytics) Case 2 - Streaming Data (Network Channel)
Where not to use Pig?
What is Pig?
Use Cases Where Pig is used
Examples of Data Analysis Task
Conceptual Data Flow
How Yahoo uses Pig?
Pig – Basic Program Structure
Pig – Running
Grunt Shell Local Mode $ > pig -x local Map. Reduce Mode $ > pig or $ > pig -x mapreduce NOTE - Hadoop should be UP for Map. Reduce Mode. For either mode, the Grunt shell is invoked and you can enter commands at the prompt. The results are displayed to your terminal screen (if DUMP is used) or to a file (if STORE is used).
Script File To run PIG Commands as batch Jobs �Local Mode $> pig -x local Test. Script. pig �Mapreduce Mode $> pig Test. Script. pig OR $> pig -x mapreduce Test. Script. pig �For either mode, the Pig Latin statements are executed , and, the results are displayed to your terminal screen (if DUMP is used) or to a file (if STORE is used)
Embedded Programs Embed Pig commands in a host language. Run the program. Local Mode Compile Program $> javac -cp pig. jar Test. Local. java Note: Test. Local. class is written to your current working directory. Include “. ” in the class path when you run the program. Run the Program $> java -cp pig. jar: . Test. Local
Embedded Programs Map. Reduce Mode Point $HADOOPDIR to the directory that contains the hadoop-site. xml file. $> export HADOOPDIR=$HADOOP_HOME/conf Compile the program: $> javac -cp pig. jar Test. Mapreduce. java Note: Test. Mapreduce. class is written to your current working directory. Include “. ” in the class path when you run the program. Run the program: $> java -cp pig. jar: . : $HADOOPDIR Test. Mapreduce
Case Sensitivity grunt> A = LOAD '/home/cloudera/Desktop/firstsamplepig. txt' using Pig. Storage(', ') as (eid: int, ename: chararray); grunt> DUMP A; (1, Bala) (2, Sriram)
Shell and Utility Commands § grunt>fs –ls /user § grunt>sh ls –l /home/cloudera/pig/jars
Pig is made up of two components
Pig - Execution
Pig Installation • Pig Installation • Pig Architecture
Pig Installation Download PIG ftp: //ftp. nextgen. com/pub/Hadoop%20 Ecosystem/pig-0. 8. 1 -cdh 3 u 2. tar. gz Un tar PIG $> tar -xzvf pig-0. 8. 1 -cdh 3 u 2. tar. gz Set Environment Variables Open. profile export PIG_HOME=/home/emp. Id/pig-0. 8. 1 -cdh 3 u 2 export PATH=$PATH: $PIG_HOME/bin export PIG_CLASSPATH=$HADOOP_HOME : $HADOOP_HOME/conf Load. profile File $>. . profile
Pig Architecture
Pig Data Model • • • Pig – Four Basic types of Data Model Pig Data Types Data Structure used in Pig: Field Data Structure used in Pig : Tuple Data Structure used in Pig : Bag Data Structure used in Pig: Map
Pig - Four Basic Types of Data Models
Data Model
Pig – Data types
Pig – Data types
Pig – Data types Complex Types can contain any type of data including complex Map chararray to data element mapping. Element : any Pig Type (Scalar or Complex) Key : charaaray Tuple Ordered Collection of Pig data elements. Analogus to row in SQL. General Tuple : : t : (field 1 , field 2) Field can be any Pig Type. Schema definition is not necessary.
Pig – Data types Bag Unordered collection of tuples. Schema definition is not necessary. General Bag : : b : { ( ' China ' , ' 01 ' ) , ( ' India ' , 02 ) , ( ' Brazil ' , 03 ) } Max Size of Bag = Size of Local Disk available.
Pig – Data types. .
Pig – Data types. .
Data Structure used in Pig: Field
Data Structure used in Pig: Tuple
Data Structure used in Pig: Tuple. .
Data Structure used in Pig: Bag
Data Structure used in Pig: Bag. .
Data Structure used in Pig: Bag. .
Data Structure used in Pig: Map
Data Structure used in Pig: Map. .
Data Structure used in Pig: Map. .
Pig Latin • • • • Pig Latin Program Pig Latin Relational Operators Pig Latin – Null Pig Latin File Folders Pig Latin Group Operator Joins and Co Group Union Pig Built-in Functions Utility Commands Specialized Join Replicated Join Skewed Join Merge Join Pig UDF
Pig Latin Program
Pig Latin Relational Operators
Pig Latin - Null
Data
Pig Latin File Folders
Pig Latin – Group Operator § Group the data in single relation
Pig Latin – Co Group Operator § Group the data in two or more relation
Joins and Co Group
Union
Pig Built-in Functions
Pig Built-in Functions
Utility Commands
Specialized Joins
Replicated Joins
Skewed Joins
Merge Joins
Pig User Defined Functions (UDF)
Pig – Creating UDF
Pig – Calling a UDF
Pig – UDF’s In-built
Pig Streaming
Pig Streaming
Parameter Substitution in Pig
Parameter Substitution in Pig
Parameter Substitution in Pig
Piggy Bank
Diagnostic Operators & UDF Statements
Describe
Explain – Logical Plan
Explain – Physical Plan
Explain – Map Reduce Plan
Illustrate
Demo
Demo on Healthcare Dataset in Pig
Use Cases in Healthcare
Demo on Weather Data in Pig
Assignment
Thank You !!!!!!
- Slides: 84