Big Data Analytics with R and Hadoop Chapter

Big Data Analytics with R and Hadoop Chapter 3 : Integrating R and Hadoop Sang-Min Song 2015. 04. 09

Three ways to link R and Hadoop RHIPE RHadoop streaming Chapter 3 : Integrating R and Hadoop 2

Introducing RHIPE stands for R and Hadoop Integrated Programming Environment. It means "in a moment" in Greek and is a merger of R and Hadoop. The RHIPE package uses the Divide and Recombine technique to perform data analytics over Big Data. RHIPE has mainly been designed to accomplish two goals. Allowing you to perform in-depth analysis of large as well as small data. Allowing users to perform the analytics operations within R using a lower- level language. RHIPE is a lower-level interface as compared to HDFS and Map. Reduce operation. Chapter 3 : Integrating R and Hadoop 3

Install Sequence 1. Installing Hadoop. 2. Installing R. 3. Installing protocol buffers. 4. Setting up environment variables. 5. Installing r. Java. 6. Installing RHIPE. Chapter 3 : Integrating R and Hadoop 4

Installing RHIPE 3. Installing protocol buffers Chapter 3 : Integrating R and Hadoop 5

Installing RHIPE 4. Environment variables ~. /bashrc file of hduser (Hadoop user) R console Chapter 3 : Integrating R and Hadoop 6

Installing RHIPE 5. The r. Java package installation 6. Installing RHIPE Chapter 3 : Integrating R and Hadoop 7

Understanding the architecture of RHIPE Chapter 3 : Integrating R and Hadoop 8

Word count Chapter 3 : Integrating R and Hadoop 9

Word count Chapter 3 : Integrating R and Hadoop 10

Word count Chapter 3 : Integrating R and Hadoop 11

Understanding the RHIPE function reference All these methods are with three categories Initialization, HDFS, and Map. Reduce operations Initialization rhinit(TRUE, TRUE) Chapter 3 : Integrating R and Hadoop 12

Understanding the RHIPE function reference HDFS rhls(path) hdfs. getwd() hdfs. setwd("/RHIPE") rhput(src, dest) and rhput("/usr/local/hadoop/NOTICE. txt", "/RHIPE/") rhcp('/RHIPE/1/change. txt', '/RHIPE/2/change. txt') rhdel("/RHIPE/1") rhget("/RHIPE/1/part-r-00000", "/usr/local/") rhwrite(list(1, 2, 3), "/tmp/x") Chapter 3 : Integrating R and Hadoop 13

Understanding the RHIPE function reference Map. Reduce rhwatch(map, reduce, combiner, input, output, mapred, partitioner, mapred, jobname) rhex(job) rhjoin(job) rhkill(job) rhoptions() rhstatus(job) Chapter 3 : Integrating R and Hadoop 14

Introducing RHadoop is available with three main R packages rhdfs, rmr, and rhbase. rhdfs is an R interface for providing the HDFS usability from the R console. rmr is an R interface for providing Hadoop Map. Reduce facility inside the R environment. rhbase is an R interface for operating the Hadoop HBase data source stored at the distributed network via a Thrift server. Chapter 3 : Integrating R and Hadoop 15

Understanding the architecture of RHadoop Since Hadoop is highly popular because of HDFS and Map. Reduce, Revolution Analytics has developed separate R packages, namely, rhdfs, rmr, and rhbase. Chapter 3 : Integrating R and Hadoop 16

Installing RHadoop We need several R packages to be installed that help it to connect R with Hadoop. r. Java, RJSONIO, itertools, digest, Rcpp, httr, functional, devtools, plyr, reshape 2 Setting environment variables Chapter 3 : Integrating R and Hadoop 17
![Installing RHadoop [rhdfs, rmr, rhbase] Chapter 3 : Integrating R and Hadoop 18 Installing RHadoop [rhdfs, rmr, rhbase] Chapter 3 : Integrating R and Hadoop 18](http://slidetodoc.com/presentation_image/5374462d3c159aa5cec2e8160821e682/image-18.jpg)
Installing RHadoop [rhdfs, rmr, rhbase] Chapter 3 : Integrating R and Hadoop 18

Word count Defining the Map. Reduce job Map phase Executing the Map. Reduce job Reduce phase Exploring the wordcount output Chapter 3 : Integrating R and Hadoop 19

Understanding the RHadoop function reference The hdfs package Initialization hdfs. init() hdfs. defaults() File manipulation hdfs. put('/usr/local/hadoop/README. txt', '/RHadoop/1/') hdfs. copy('/RHadoop/1/', '/RHadoop/2/') hdfs. move('/RHadoop/1/README. txt', '/RHadoop/2/') hdfs. rename('/RHadoop/README. txt', '/RHadoop/README 1. txt') hdfs. delete("/RHadoop") hdfs. rm("/RHadoop") hdfs. chmod('/RHadoop', permissions= '777') Chapter 3 : Integrating R and Hadoop 20

Understanding the RHadoop function reference The hdfs package File read/write f = hdfs. file("/RHadoop/2/README. txt", "r", buffersize=104857600) hdfs. write(object, con, hsync=FALSE) hdfs. close(f) m = hdfs. read(f) Directory operation hdfs. mkdir("/RHadoop/2/") hdfs. rm("/RHadoop/2/") Utility Hdfs. ls('/') hdfs. file. info("/RHadoop") Chapter 3 : Integrating R and Hadoop 21

Understanding the RHadoop function reference The rmr package For storing and retrieving data small. ints = to. dfs(1: 10) from. dfs('/tmp/Rtmp. RMIXzb/file 2 bda 3 fa 07850') For Map. Reduce mapreduce(input, output, map, reduce, combine, input. fromat, output. format, verbose) keyval(key, val) Chapter 3 : Integrating R and Hadoop 22

Thank you Chapter 3 : Integrating R and Hadoop 23
- Slides: 23