Cloud Computing Pig Pig Pig Pig Latin Map

  • Slides: 20
Download presentation
Cloud Computing 数据处理平台-Pig

Cloud Computing 数据处理平台-Pig

Pig的基本框架 ◦ Pig的基本框架 Pig Latin Map. Reduce Cluster

Pig的基本框架 ◦ Pig的基本框架 Pig Latin Map. Reduce Cluster

Pig Latin编程语言 数据类型 类型 int 说明 有符号的32位整数 示例 127 long 有符号的64位整数 127 L float

Pig Latin编程语言 数据类型 类型 int 说明 有符号的32位整数 示例 127 long 有符号的64位整数 127 L float 32位浮点数 3. 14 F double 64位浮点数 3. 14 chararray UTF-8格式的字符 数组 Byte数组 Hello World bytearray

Pig Latin编程语言 运算符 ◦ Bicond运算符举例 grunt>A = LOAD ‘data. txt’ AS (f 1: int,

Pig Latin编程语言 运算符 ◦ Bicond运算符举例 grunt>A = LOAD ‘data. txt’ AS (f 1: int, f 2: int, B: bag{T: tuple(t 1: int, t 2: int)}); grunt>DUMP A; (3, 2, {(1, 7), (3, 5)}) (3, 3, {(1, 7), (3, 5)}) (3, 5, {(1, 7, ), (3, 5), (4, 6)}) 执行下面的操作: grunt>X = FOREACH A GENERATE f 2, (f 2 == 2 ? 1 : COUNT(B)); grunt>DUMP X; (2, 1) (3, 2 L) (5, 3 L)

Pig Latin编程语言 实例分析:在每个category中找到最访问的10个页面 Visits Url Info User Url Time Url Category Page. Rank Amy

Pig Latin编程语言 实例分析:在每个category中找到最访问的10个页面 Visits Url Info User Url Time Url Category Page. Rank Amy cnn. com 8: 00 cnn. com News 0. 9 Amy bbc. com 10: 00 bbc. com News 0. 8 Amy flickr. com 10: 05 flickr. com Photos 0. 7 Fred cnn. com 12: 00 espn. com Sports 0. 9

Pig Latin编程语言 Load Visits Group by url Foreach url generate count Load Url Info

Pig Latin编程语言 Load Visits Group by url Foreach url generate count Load Url Info Join on url Group by category Foreach category generate top 10 urls

Pig Latin编程语言 Pig Latin实现 visits = load ‘/data/visits’ as (user, url, time); g. Visits

Pig Latin编程语言 Pig Latin实现 visits = load ‘/data/visits’ as (user, url, time); g. Visits = group visits by url; visit. Counts = foreach g. Visits generate url, count(visits); url. Info = load ‘/data/url. Info’ as (url, category, p. Rank); visit. Counts = join visit. Counts by url, url. Info by url; g. Categories = group visit. Counts by category; top. Urls = foreach g. Categories generate top(visit. Counts, 10); store top. Urls into ‘/data/top. Urls’;

Pig Latin编程语言 Map. Reduce作业 Map 1 Load Visits Group by url 每个group或者join操作都形 成一个map-reduce的界限 Reduce

Pig Latin编程语言 Map. Reduce作业 Map 1 Load Visits Group by url 每个group或者join操作都形 成一个map-reduce的界限 Reduce 1 Foreach url generate count Map 2 Load Url Info Join on url Group by category Foreach category generate top 10(urls) Reduce 2 Map 3 Reduce 3