Environment Cloudera Quick Start VM with 5 4

Contents • Using HDFS • • How To Use How To Upload File How

Using HDFS • How to use HDFS • How to Upload File • How

Using HDFS – How To Use (1) • You see a help message describing

Using HDFS – How To Use (2) • You see the contents of directory

Using HDFS – How To Upload File (1) • Unzip ‘shakespeare. tar. gz’: $

Using HDFS – How To Upload File (2) • Insert ‘shakespeare’ directory into HDFS:

Using HDFS – How To View and Manipulate Files (1) • Remove directory $

Using HDFS – How To View and Manipulate Files (2) • Print the last

Using HDFS – How To View and Manipulate Files (3) • Download file and

Exercise How To View and Manipulate Files

Running a Map. Reduce Job • Goal • Remind Map. Reduce • Code Review

Running a Map. Reduce Job – Goal Works of Shakespeare ALL'S WELL THAT ENDS

Running a Map. Reduce Job – Shuffle & Sort

Running a Map. Reduce Job – Sum. Reducer

Running a Map. Reduce Job – Word. Count Code Review • Word. Count. java

Running a Map. Reduce Job – Code Review : Word. Count. java public class

Running a Map. Reduce Job – Code Review : Word. Mapper. java public class

Running a Map. Reduce Job – Code Review : Word. Reduce. java Input Data

Running a Map. Reduce Job – Run Word. Count in HDFS • Complie three

Running a Map. Reduce Job – Run Word. Count in HDFS • Submit a

Importing Data With Sqoop Review My. SQL and Exercise

Importing Data With Sqoop • Log on to My. SQL: $ mysql --user=root

Importing Data With Sqoop – Review My. SQL (1) • Log on to My.

Importing Data With Sqoop – Review My. SQL (2) • Review ‘customers’ table schema:

Importing Data With Sqoop – Review My. SQL (3) • Review ‘customers’ table: >

Importing Data With Sqoop – How To Use (1) • List the databases (schemas)

Importing Data With Sqoop – How To Use (2) • Import the ‘customers’ table

Slides: 35

Download presentation

Environment • Cloudera Quick. Start VM with 5. 4. 2 • Guide for Download • http: //ailab. ssu. ac. kr/rb/? c=8/29&cat=2015_2_%EC%9 C%A 0%EB%B 9%84%EC %BF%BC%ED%84%B 0%EC%8 A%A 4+%EC%9 D%91%EC%9 A%A 9%EC%8 B%9 C% EC%8 A%A 4%ED%85%9 C&uid=660

Contents • Using HDFS • • How To Use How To Upload File How To View and Manipulate File Exercise • Running Map. Reduce Job : Word. Count • • Goal Remind Map. Reduce Code Review Run Word. Count Program • Extra Exercise : Number of Connection per Hour • Meaningful Data from ‘Access Log’ • Foundation of Regural Expression • Run Map. Reduce Job • Importing Data With Sqoop • Review My. SQL • How To Use

Using HDFS With Exercise

Using HDFS • How to use HDFS • How to Upload File • How to View and Manipulate File

Using HDFS – How To Use (1) • You see a help message describing all the commands associated with HDFS $ hadoop fs

Using HDFS – How To Use (2) • You see the contents of directory in HDFS: $ hadoop fs –ls /user/cloudera

Exercise How To Use

Using HDFS – How To Upload File (1) • Unzip ‘shakespeare. tar. gz’: $ cd ~/training_materials/developer/data $ tar zxvf shakespeare. tar. gz

Using HDFS – How To Upload File (2) • Insert ‘shakespeare’ directory into HDFS: $ hadoop fs -put shakespeare /user/cloudera/shakespeare

Exercise How To Upload

Using HDFS – How To View and Manipulate Files (1) • Remove directory $ hadoop fs –ls shakespeare $ hadoop fs –rm shakespeare/glossary

Using HDFS – How To View and Manipulate Files (2) • Print the last 50 lines of Herny IV $ hadoop fs –cat shakespeare/histories | tail –n 50

Using HDFS – How To View and Manipulate Files (3) • Download file and manipulate $ hadoop fs –get shakespeare/poems ~/shakepoems. txt $ less ~/shakepoems. txt • If you want to know other command: $ hadoop fs

Exercise How To View and Manipulate Files

Running a Map. Reduce Job With Exercise

Running a Map. Reduce Job • Goal • Remind Map. Reduce • Code Review • Run Word. Count Program

Running a Map. Reduce Job – Goal Works of Shakespeare ALL'S WELL THAT ENDS WELL DRAMATIS PERSONAE KING OF FRANCE (KING: ) DUKE OF FLORENCE (DUKE: ) BERTRAM Count of Rousillon. LAFEU an old lord. PAROLLES a follower of Bertram. Steward | | servants to the Countess of Rousillon. Clown | A Page. (Page: ) COUNTESS OFROUSILLON mother to Bertram. (COUNTESS: ) HELENA a gentlewoman protected by the Countess. … Final Result Run Word. Count Key Value A 2027 ADAM 16 AARON 72 ABATE 1 ABIDE 1 ABOUT 18 ACHIEVE 1 ACKNOWN 1 … … We will submit a Map. Reduce job to count the number of occurrences of every word in the works of Shakespeare

Running a Map. Reduce Job – Mapper

Running a Map. Reduce Job – Shuffle & Sort

Running a Map. Reduce Job – Sum. Reducer

Running a Map. Reduce Job – Word. Count Code Review • Word. Count. java • A simple Map. Reduce driver class • Word. Mapper. java • A Mapper class for the job • Sum. Reducer. java • A reducer class for the job

Running a Map. Reduce Job – Code Review : Word. Count. java public class Word. Count { job. set. Mapper. Class(Word. Mapper. class); job. set. Reducer. Class(Sum. Reducer. class); public static void main(String[] args) throws Exception { if (args. length != 2) { System. out. printf( "Usage: Word. Count <input dir> <output dir>n"); System. exit(-1); } job. set. Output. Key. Class(Text. class); job. set. Output. Value. Class(Int. Writable. class); boolean success = job. wait. For. Completion(true); System. exit(success ? 0 : 1); } } Job job = new Job(); job. set. Jar. By. Class(Word. Count. class); job. set. Job. Name("Word Count"); File. Input. Format. set. Input. Paths(job, new Path(args[0]));

Running a Map. Reduce Job – Code Review : Word. Mapper. java public class Word. Mapper extends Mapper<Long. Writable, Text, Int. Writable> { @Override public void map(Long. Writable key, Text value, Context context) throws IOException, Interrupted. Exception { String line = value. to. String(); for (String word : line. split("\W+")) { if (word. length() > 0) { context. write(new Text(word), new Int. Writable(1)); } } The map method runs once for each line of text in the input file. Ex ) Text File => the cat sat on the mat The aardvark sat on the sofa Key (Long. Writable) Value (Text) 1000055 the cat sat on the mat 1000257 the aardvark sat one the sofa Write to context Object Key (Text) Value (Int. Writable) the 1 cat 1 … …

Running a Map. Reduce Job – Code Review : Word. Reduce. java Input Data public class Sum. Reducer extends Reducer<Text, Int. Writable, Text, Int. Writable> { Output Data Key Values @Override public void reduce (Text key, Iterable<Int. Writable> values, Context context) throws IOException, Interrupted. Exception { int word. Count = 0; aardvark [1] cat 1 mat [1] mat 1 on [1, 1] on 2 for (Int. Writable value : values) { word. Count += value. get(); } context. write(key, new Int. Writable(word. Count)); sat [1, 1] sat 2 sofa [1] sofa 1 the [1, 1, 1, 1] the 4 … … } } Key Sum. Reducer aardvark Value 1 The reduce method runs once for each key received from the shuffle and sort phase of the Map. Reduce framework

Running a Map. Reduce Job – Run Word. Count in HDFS • Complie three Java classes and Collect complied Java files into a JAR file: $ cd ~/{Your Workspace} $ javac –classpath `hadoop classpath` *. java $ jar cvf wc. jar *. class

Running a Map. Reduce Job – Run Word. Count in HDFS • Submit a Map. Reduce job to Hadoop using your JAR file to count the occurrences of each word in Shakespeare: $ hadoop jar wc. jar Word. Count shakespeare wordcounts • wc. jar – jar file • Word. Count – Class Name containing Main method(Driver Class) • shakespeare – Input directory in HDFS • wordcounts – Output directory in HDFS

Exercise Map. Reduce Job : Word. Count

Importing Data With Sqoop Review My. SQL and Exercise

Importing Data With Sqoop • Log on to My. SQL: $ mysql --user=root --password=cloudera • Select Database > use retail_db; • Show Databases: > show databases;

Importing Data With Sqoop – Review My. SQL (1) • Log on to My. SQL: $ mysql --user=root --password=cloudera • Show Databases: > show databases; • Select Databases: > use retail_db; • Show Tables: > show tables;

Importing Data With Sqoop – Review My. SQL (2) • Review ‘customers’ table schema: > DESCRIBE customers;

Importing Data With Sqoop – Review My. SQL (3) • Review ‘customers’ table: > DESCRIBE customers; … > SELECT * FROM customers LIMIT 5;

Importing Data With Sqoop – How To Use (1) • List the databases (schemas) in your database server: $ sqoop list-databases --connect jdbc: mysql: //localhost --username root --password cloudera • List the tables in the ‘retail_db’ database: $ sqoop list-tables --connect jdbc: mysql: //localhost/retail_db --username root --password cloudera

Importing Data With Sqoop – How To Use (2) • Import the ‘customers’ table into HDFS $ sqoop import --connect jdbc: mysql: //localhost/retail_db --table customers --fields-terminated-by 't' --username training --password training • Verify that the command has worked $ hadoop fs –ls customers $ hadoop fs –tail movie/part-m-00000