MR Tez Query Comparison query 12 query 15

  • Slides: 8
Download presentation
MR / Tez Query Comparison query 12 query 15 query 21 query 26 query

MR / Tez Query Comparison query 12 query 15 query 21 query 26 query 27 query 28 query 34 query 39 query 43 query 46 query 52 query 55 query 67 query 68 query 73 query 88 query 90 query 92 query 96 query 97 query 98 Hive Trunk M/R Hive Trunk Tez (Cold) Hive Trunk Tez (Hot) 297. 5 20. 0 9. 7 75. 9 62. 6 58. 7 39. 3 52. 4 46. 9 33. 0 23. 3 37. 6 17. 7 8. 4 39. 9 24. 2 12. 5 58. 9 23. 9 16. 1 87. 7 31. 7 25. 3 234. 5 62. 8 55. 5 57. 0 34. 4 26. 0 103. 3 46. 1 30. 9 59. 9 23. 5 15. 6 59. 3 27. 7 18. 3 820. 9 821. 1 787. 2 102. 7 53. 1 42. 2 47. 9 27. 7 18. 9 87. 7 29. 7 22. 7 483. 3 95. 0 90. 2 122. 3 41. 1 26. 9 278. 6 142. 5 135. 7 42. 8 24. 4 16. 4 279. 0 147. 5 133. 4 1451. 5 50. 0 38. 6 Tez Relative Gain Hot / Cold Gain (%) 2958. 8% 105. 2% 29. 4% 6. 7% -16. 2% 11. 7% 101. 0% 41. 3% 348. 8% 111. 8% 218. 4% 93. 4% 265. 9% 48. 7% 246. 6% 25. 3% 322. 1% 13. 1% 119. 2% 32. 4% 234. 4% 49. 2% 285. 3% 51. 3% 224. 1% 51. 4% 4. 3% 143. 2% 25. 9% 153. 1% 46. 2% 287. 1% 31. 0% 435. 6% 5. 3% 355. 3% 53. 0% 105. 4% 5. 1% 160. 6% 48. 6% 109. 1% 10. 6% 3662. 2% 29. 5% HW = 20 Node (48 GB RAM, 6 x disk) SW = Hive Trunk (Nov 13 2013) + ORCFile + Vectorization © Hortonworks Inc. 2013. Page 1

Query 88 select * from (select count(*) h 8_30_to_9 from store_sales JOIN household_demographics ON

Query 88 select * from (select count(*) h 8_30_to_9 from store_sales JOIN household_demographics ON store_sales. ss_hdemo_sk = household_demographics. hd_demo_sk JOIN time_dim ON store_sales. ss_sold_time_sk = time_dim. t_time_sk JOIN store ON store_sales. ss_store_sk = store. s_store_sk where time_dim. t_hour = 8 and time_dim. t_minute >= 30 and ((household_demographics. hd_dep_count = 3 and household_demographics. hd_vehicle_count<=3+2) or (household_demographics. hd_dep_count = 0 and household_demographics. hd_vehicle_count<=0+2) or (household_demographics. hd_dep_count = 1 and household_demographics. hd_vehicle_count<=1+2)) and store. s_store_name = 'ese') s 1 JOIN (select count(*) h 9_to_9_30 from store_sales. . . • 8 full table scans © Hortonworks Inc. 2013. Page 2

Query 88: M/R Total Map. Reduce jobs = 29. . . Total Map. Reduce

Query 88: M/R Total Map. Reduce jobs = 29. . . Total Map. Reduce CPU Time Spent: 0 days 2 hours 52 minutes 39 seconds 380 msec OK 345617 687625 686131 1032842 1030364 606859 604232 692428 Time taken: 403. 28 seconds, Fetched: 1 row(s) © Hortonworks Inc. 2013. Page 3

Query 88: Tez Map 1: 1/1 Map 12: 1/1 Map 14: 1/1 Map 15:

Query 88: Tez Map 1: 1/1 Map 12: 1/1 Map 14: 1/1 Map 15: 1/1 Map 16: 241/241 Map 19: 1/1 Map 20: 1/1 Map 22: 1/1 Map 23: 1/1 Map 24: 241/241 Map 27: 1/1 Map 28: 1/1 Map 29: 1/1 Map 30: 240/240 Map 32: 241/241 Map 34: 1/1 Map 36: 1/1 Map 37: 1/1 Map 38: 1/1 Map 42: 1/1 Map 43: 1/1 Map 44: 240/240 Reducer 10: 1/1 Reducer 17: 1/1 Reducer 25: 1/1 Reducer 33: 1/1 Reducer 40: 1/1 Reducer 45: 1/1 Reducer 47: 1/1 Reducer 5: 1/1 Reducer 7: 1/1 Reducer 8: 1/1 Reducer 9: 1/1 Status: Finished successfully OK 345617 687625 686131 1032842 1030364 606859 Time taken: 90. 233 seconds, Fetched: 1 row(s) © Hortonworks Inc. 2013. Map 13: 1/1 Map 18: 1/1 Map 21: 1/1 Map 26: 1/1 Map 3: 241/241 Map 35: 1/1 Map 39: 241/241 Map 46: 241/241 Reducer 31: 1/1 Reducer 41: 1/1 Reducer 6: 1/1 604232 692428 Page 4

Status • Broadcast Join – Regular tasks to filter/prep the side to broadcast –

Status • Broadcast Join – Regular tasks to filter/prep the side to broadcast – Hashtables assembled in the join task – Can run in any vertex (not just map) • Tez. Sessions (AM, FS, UGI, Meta. Store) – Start with cli/hs 2 session – Brings up AM, connects to metastore, etc – Setup only once per session • Container reuse – Task launch is now cheap – Multiple waves and re-use within session – Stragglers • Multiple inputs/outputs/ Tez. Processor – Can handle multiple scatter/gather + broadcast + 1 -1 edges – Can handle multiple outputs for multi-table insert case – No need for single task with multiple operator pipelines © Hortonworks Inc. 2013.

Status • Localization – Works with hive-exec + UDFs – If desired: Avoids re-localization

Status • Localization – Works with hive-exec + UDFs – If desired: Avoids re-localization of hive-exec • Split Gen in AM/Tez. Grouped. Splits/Caching – Splits generated according to headroom – Caching of NN connections • Statistics (not Tez specific) – Allows to compute num of tasks – Used for join conversion – Degrades with available stats • Meta. Store improvements (not Tez specific) – Partition pruning is MUCH faster now • Tez. Mini. MR –. q file tests for Tez • Explain plan © Hortonworks Inc. 2013.

Current limitations • Not in phase I – RC Merge task/ analyze uses MR

Current limitations • Not in phase I – RC Merge task/ analyze uses MR on Tez – UNION ALL not yet supported – SMB join not yet supported • In phase I – More testing + bug fixes! – Integrate with new annotated – Re-localization (Tez) – Tez release © Hortonworks Inc. 2013.

Try Tez For Yourself • • • 1: Download Hortonworks Sandbox 2. 0 :

Try Tez For Yourself • • • 1: Download Hortonworks Sandbox 2. 0 : hortonworks. com/sandbox 2: Log in: root/hadoop 3: git clone https: //github. com/t 3 rmin 4 t 0 r/tez-autobuild/ 4: cd tez-autobuild ; make dist install 5: /opt/hive/bin/hive 6: set hive. optimize. tez=true/false © Hortonworks Inc. 2013.