How YARN Enables Multiple Data Processing Engines in
![How YARN Enables Multiple Data Processing Engines in Hadoop We Do Hadoop Page 1 How YARN Enables Multiple Data Processing Engines in Hadoop We Do Hadoop Page 1](https://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-1.jpg)
How YARN Enables Multiple Data Processing Engines in Hadoop We Do Hadoop Page 1 Mizell© Eric Director, Solution Engineering Hortonworks Inc. 2011 – 2014. All Rights Reserved
![Agenda • YARN 101 – Yet Another Resource Negotiator • Enabling a Modern Data Agenda • YARN 101 – Yet Another Resource Negotiator • Enabling a Modern Data](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-2.jpg)
Agenda • YARN 101 – Yet Another Resource Negotiator • Enabling a Modern Data Architecture • YARN in action – Demo of streaming application • SQL in Hadoop – Hive – Phoenix over HBase – Spark Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 2
![YARN Concepts • Application – Application is a job submitted to the framework – YARN Concepts • Application – Application is a job submitted to the framework –](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-3.jpg)
YARN Concepts • Application – Application is a job submitted to the framework – Example – Map. Reduce Job • Container – Basic unit of allocation – Fine-grained resource allocation across multiple resource types (memory, cpu, disk, network, gpu etc. ) – container_0 = 2 GB, 1 CPU – container_1 = 1 GB, 6 CPU – Replaces the fixed map/reduce slots Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 3
![YARN Architecture • Resource Manager – Global resource scheduler – Hierarchical queues – Application YARN Architecture • Resource Manager – Global resource scheduler – Hierarchical queues – Application](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-4.jpg)
YARN Architecture • Resource Manager – Global resource scheduler – Hierarchical queues – Application management • Node Manager – Per-machine agent – Manages the life-cycle of container – Container resource monitoring • Application Master – Per-application – Manages application scheduling and task execution – E. g. Map. Reduce Application Master Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 4
![YARN – Running Apps create app 1 Hadoop Client 1 submit app 1 Resource. YARN – Running Apps create app 1 Hadoop Client 1 submit app 1 Resource.](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-5.jpg)
YARN – Running Apps create app 1 Hadoop Client 1 submit app 1 Resource. Manager ASM . . . . negotiates. . . . Containers NM . . . . reports to. . . . ASM Scheduler create app 2 Hadoop Client 2 submit app 2 Scheduler ASM queues status report Node. Manager C 2. 1 Node. Manager C 2. 2 Node. Manager AM 2 Rack 1 Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Node. Manager C 1. 3 Node. Manager C 2. 3 C 1. 2 Node. Manager AM 1 Rack 2 Node. Manager C 1. 4 Node. Manager C 1. 1 Rack. N . . . . partitions. . . . Resources
![Hadoop 2. x Stack – Enabled by YARN Hadoop BATCH, INTERACTIVE & REAL-TIME DATA Hadoop 2. x Stack – Enabled by YARN Hadoop BATCH, INTERACTIVE & REAL-TIME DATA](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-6.jpg)
Hadoop 2. x Stack – Enabled by YARN Hadoop BATCH, INTERACTIVE & REAL-TIME DATA ACCESS GOVERNANCE Data Workflow, Lifecycle & Governance Falcon Sqoop Flume Kafka NFS Web. HDFS Script SQL Java Scala No. SQL Stream Pig Hive Cascading HBase Accumulo Storm Tez ISV Engines Solr Spark (Cluster Resource Management) 1 ° ° ° ° Windows ° ° HDFS ° ° ° (Hadoop File System) ° ° Distributed ° ° ° Deployment Choice © Hortonworks Inc. 2011 – 2014. All Rights Reserved SECURITY OPERATIONS Authentication Authorization Accounting Data Protection Provision, Manage & Monitor Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox Cluster: Ranger Slider Tez Others In-Memory Search YARN: Data Operating System Linux Page 6 is the architectural center of HDP ° ° ° ° On-Premises Cloud Ambari Zookeeper Scheduling Oozie Enables batch, interactive and real-time workloads Provides comprehensive enterprise capabilities The widest range of deployment options
![Hadoop 2 Stack – Versions 1. 7. 0 0. 14. 0 2. 6. 0 Hadoop 2 Stack – Versions 1. 7. 0 0. 14. 0 2. 6. 0](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-7.jpg)
Hadoop 2 Stack – Versions 1. 7. 0 0. 14. 0 2. 6. 0 0. 5. 2 0. 60. 0 0. 6. 0 0. 4. 0 0. 9. 3 4. 2. 0 1. 5. 2 4. 7. 2 1. 5. 1 3. 4. 5 1. 4. 4 0. 4. 0 1. 4. 4 Data Access Hadoop © Hortonworks Inc. 2011 – 2014. All Rights Reserved Governance & Integration Operations Ranger 3. 3. 2 Knox Flume Sqoop Kafka Falcon Slider Tez Solr 1. 3. 1 Spark Accumulo Phoenix 0. 96. 1 Data Management Page 7 4. 0. 0 1. 4. 0 0. 9. 1 HBase 0. 12. 0 Pig 2013 2. 2. 0 Hadoop &YARN October 0. 4. 0. 0 0. 12. 0 2014 HDP 2. 0 0. 98. 0 3. 4. 6 0. 5. 0 Storm 0. 12. 1 Hive & HCatalog April 2. 4. 0 0. 5. 0 1. 4. 5 0. 13. 0 HDP 2. 1 4. 1. 0 Zookeeper 2014 1. 2. 0 0. 98. 4 Oozie 0. 14. 0 December 0. 8. 1 Ambari HDP 2. 2 4. 10. 2 1. 6. 1 Security
![Enabling a Modern Data Architecture with Apache Hadoop Hortonworks. We do Hadoop. Page 8 Enabling a Modern Data Architecture with Apache Hadoop Hortonworks. We do Hadoop. Page 8](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-8.jpg)
Enabling a Modern Data Architecture with Apache Hadoop Hortonworks. We do Hadoop. Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
![APPLICATIONS Traditional systems under pressure Custom Applications Business Analytics Packaged Applications Clickstream • Silos APPLICATIONS Traditional systems under pressure Custom Applications Business Analytics Packaged Applications Clickstream • Silos](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-9.jpg)
APPLICATIONS Traditional systems under pressure Custom Applications Business Analytics Packaged Applications Clickstream • Silos of Data DATA SYSTEM New Data Types • Costly to Scale RDBMS EDW MPP Geolocation Sentiment, Web Data • Constrained Schemas Sensor. Machine Data SOURCES Unstructured docs, emails Page 9 …and difficult to manage new data Existing Sources (CRM, ERP, …) © Hortonworks Inc. 2011 – 2014. All Rights Reserved Server logs
![APPLICATIONS HDP 2 and YARN enable the Modern Data Architecture Custom Applications Business Analytics APPLICATIONS HDP 2 and YARN enable the Modern Data Architecture Custom Applications Business Analytics](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-10.jpg)
APPLICATIONS HDP 2 and YARN enable the Modern Data Architecture Custom Applications Business Analytics Packaged Applications Hortonworks architected and led development of YARN Common data set, multiple applications • Optionally land all data in a single cluster DATA SYSTEM Batch RDBMS EDW Real-Time Interactive YARN: Data Operating System MPP 1 ° ° ° HDFS ° ° ° (Hadoop Distributed ° ° ° File ° System) ° ° ° • Batch, interactive & real-time use cases • Support multi-tenant access, processing & segmentation of data N YARN: Architectural center of Hadoop SOURCES • Consistent security, governance & operations EXISTING Systems Page 10 Clickstream Web &Social Geolocation Sensor & Machine © Hortonworks Inc. 2011 – 2014. All Rights Reserved Server Logs Unstructured • Ecosystem applications certified by Hortonworks to run natively in Hadoop
![YARN in Action Hortonworks. We do Hadoop. Page 11 © Hortonworks Inc. 2011 – YARN in Action Hortonworks. We do Hadoop. Page 11 © Hortonworks Inc. 2011 –](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-11.jpg)
YARN in Action Hortonworks. We do Hadoop. Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
![Trucking Company’s YARN-enabled Architecture Truck Sensors Inbound Messaging (Kafka) (Active. MQ) Interactive Query Stream Trucking Company’s YARN-enabled Architecture Truck Sensors Inbound Messaging (Kafka) (Active. MQ) Interactive Query Stream](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-12.jpg)
Trucking Company’s YARN-enabled Architecture Truck Sensors Inbound Messaging (Kafka) (Active. MQ) Interactive Query Stream Processing Real-time Serving (Hive on Tez) (Storm) (HBase) Many Workloads: YARN Microsoft Excel Distributed Storage: HDFS Page 12 Alerts & Events © Hortonworks Inc. 2011 – 2014. All Rights Reserved Real-Time User Interface
![Components of the Topology • 9 Node HDP 2. 2 Cluster with Storm and Components of the Topology • 9 Node HDP 2. 2 Cluster with Storm and](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-13.jpg)
Components of the Topology • 9 Node HDP 2. 2 Cluster with Storm and HBase on YARN • 4 Node 0. 8 Kafka Cluster • 1 Node Active. MQ with Stomp Protocol Enabled • Spring 4. 0 Web. MVC Web Using Socket. JS & Active. MQ over STOMP Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Page 13
![Topology Architecture Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Page Topology Architecture Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Page](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-14.jpg)
Topology Architecture Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Page 14
![Demo Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Demo Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-15.jpg)
Demo Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
![SQL in Hadoop Hortonworks. We do Hadoop. Page 16 © Hortonworks Inc. 2011 – SQL in Hadoop Hortonworks. We do Hadoop. Page 16 © Hortonworks Inc. 2011 –](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-16.jpg)
SQL in Hadoop Hortonworks. We do Hadoop. Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
![Hive • Executes on Map. Reduce or Tez (Spark in future) on YARN – Hive • Executes on Map. Reduce or Tez (Spark in future) on YARN –](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-17.jpg)
Hive • Executes on Map. Reduce or Tez (Spark in future) on YARN – Queries taking hours now take minutes on Tez • CLI and ODBC/JDBC connections • Performance Components – Vectorization – set hive. vectorized. execution. enabled; – Tez - set hive. execution. engine=tez; – CBO – hive. compute. query. using. stats=true; – hive. stats. fetch. column. stats=true; – hive. stats. fetch. partition. stats=true; – hive. cbo. enable=ture; • Enhanced security available in Hive 13 – Grant semantics, Column Level • Create/Update/Delete available in Hive 14 (GA) • Sub-second response times next year Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Page 17
![Phoenix over HBase • HBase is a No. SQL “data store” in Hadoop on Phoenix over HBase • HBase is a No. SQL “data store” in Hadoop on](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-18.jpg)
Phoenix over HBase • HBase is a No. SQL “data store” in Hadoop on YARN – Column Familys – Strong consistency for heavy reads/writes – Linear scale by adding Region Servers – Multiple access points – CLI, Java API, Thrift/Rest API – Hundreds of millions or billions of rows – Gets/Puts/Scans • Phoenix – Relational database layer over HBase – CLI and JDBC connection – Low seconds response times – Salting to prevent HBase region hot spots – Can map to an existing HBase table – Dynamic Columns at query time Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Page 18
![Spark • Created at Berkley Labs • Advance DAG execution engine that supports in-Memory Spark • Created at Berkley Labs • Advance DAG execution engine that supports in-Memory](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-19.jpg)
Spark • Created at Berkley Labs • Advance DAG execution engine that supports in-Memory and cyclic data flow • Spark SQL, MLlib, Graph. X, Spark Streaming • Runs on Hadoop on YARN, Mesos, standalone • CLI and ODBC/JDBC connectivity • In-Memory in RDD’s – Resilient Distributed Datasets – Immutable – Perfect for iterative processing – Becoming a great way to server up smaller data sets (low TB) at high speed Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Page 19
![Hadoop Summit 2015 Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hadoop Summit 2015 Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-20.jpg)
Hadoop Summit 2015 Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Page 20
![Thank You! Eric Mizell – Director, Solutions Engineering emizell@hortonworks. com @ericmizell Page 21 © Thank You! Eric Mizell – Director, Solutions Engineering emizell@hortonworks. com @ericmizell Page 21 ©](http://slidetodoc.com/presentation_image/3780a88016f37eaf2882e44a60b78b9c/image-21.jpg)
Thank You! Eric Mizell – Director, Solutions Engineering emizell@hortonworks. com @ericmizell Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
- Slides: 21