HDFS Hadoop Distributed File System hadoop fs ls
HDFS基本命令 全称:Hadoop Distributed File System • • • %hadoop fs -ls. %hadoop fs -mkdir books %hadoop fs -copy. From. Local input/docs/test. txt hdfs: //loca 1 host/user/tom/test. txt
Map. Reduce是一种编程模型 • void map(Long. Writable key, Text value, Output. Collector<Text, Int. Writable> output , Reporter reporter); • void reduce(Text key, Iterator<Int. Writable> values, Output. Collector<Text , Int. Writable> output, Reporter reporter);
Map. Reduce 求每年的最高气温的数据流 reduce map 1995234234 1995345678 1996345562 1995 [34, 78] 1996 [62] 1995 78 1996 62
Map. Reduce 求每年的最高气温的数据流 map i 1 i 2 i 3 1995234234 1995345678 1996345562 partition shuffle merge reduce 1995 [34, 78] 1996 [62] 1995234224 1995345658 1996345522 1995 [24, 58] 1996 [22] 1995234227 1995345654 1996345582 1995 [27, 54] 1996 [82] 1995 [34, 78, 24, 58, 27, 54] 1995 78 o 1 1996 [62, 22, 82] 1996 82 o 1
HIVE的实现逻辑 select year, count(temperature) from src where year>1990 group by year having count(temperature)>1000; 这个sql的语义是: • • 1)(map)从src表中选出所有的记录, 选出year>1990的记录; 2)(partition and shuffle)按照year进行分组(year相同的记录放到一组); 3)(reduce)对每个分组计算count(temperature), 选出count(temperature)>1000的 记录; 4)最后对于计算结果选出year和count(temperature)的值作为返回结果
- Slides: 36