Yarn YARN Hadoop v 2 Difficulties with the

  • Slides: 8
Download presentation
Yarn

Yarn

YARN (Hadoop v 2) Difficulties with the original Hadoop implementation lead the development of

YARN (Hadoop v 2) Difficulties with the original Hadoop implementation lead the development of a successor to Hadoop called YARN (Yet Another Resource Negotiator). As the name might indicate, YARN is meant to handle resource management. https: //hadoop. apache. org/docs/r 2. 7. 1/hadoop-yarn/hadoopyarn-site/YARN. html http: //www. iowapark. com/event/yarn-therapy-for-charity/

Problems with Hadoop v 1 ◦ As cluster sizes and the number of users

Problems with Hadoop v 1 ◦ As cluster sizes and the number of users grew, the Job Tracker became a bottleneck. ◦ The static allocation of resources to map/reduce functions lead to poor utilization. ◦ HDFS was the only storage system that many enterprise applications could use, leading to the creation of jobs not suited for Map. Reduce. ◦ On large clusters, upgrading the version of Hadoop running on each machine became problematic.

YARN Components Application. Master: Responsible for managing the work needed to be done. ◦

YARN Components Application. Master: Responsible for managing the work needed to be done. ◦ Map. Reduce Application. Master is one example Node. Manager: Each worker node has a manager responsible for gathering the required resources. Resource. Manager: Managers the Node. Managers and schedules resources with the Application. Master. The big change from Hadoop v 1 is the separation of the Resource Management from the Application Management. https: //yarn-book. com/

Benefits of YARN ◦ A rich diversity of data services, each with its own

Benefits of YARN ◦ A rich diversity of data services, each with its own programming model (not all Map. Reduce). ◦ Application Masters can negotiate for resources in patterns optimal for them (duration and size). ◦ Per node Resource Managers allow for nodes to be utilized in a dynamic manner (used when needed). ◦ Resource. Manager does just one thing (manage resources) so it can scale to tens of thousands of nodes. ◦ With Application. Master managing jobs, you can have multiple versions of an application, which doesn't require a global cluster update (and the need to halt the cluster).

What do you call a YARN script? 1. Knot 2. Cat. Toy 3. Fabric

What do you call a YARN script? 1. Knot 2. Cat. Toy 3. Fabric 4. Sweater

Frameworks Built On YARN Apache Tez ◦ Meant to handle datasets in the petabyte

Frameworks Built On YARN Apache Tez ◦ Meant to handle datasets in the petabyte range. ◦ Workflows are modeled as a directed acyclic graph (DAG) where vertices are tasks and edges are interoperational dependencies or flows of data. ◦ This model is a better fit for many jobs, as such, Pig and Hive can run atop Tez for improved performance. Apache Giraph ◦ Large graph processing system (similar to Neo 4 j) Hoya: Hbase on YARN ◦ Just what it sounds like https: //svn. apache. org/repos/asf/giraph /logo/

The Cloud - Making managing the servers someone else's problem. Lots of benefits: ◦

The Cloud - Making managing the servers someone else's problem. Lots of benefits: ◦ Much easier to "scale out" add nodes to distributed database ◦ Interchangeable Resources - servers can fail and be replaced, virtualization makes it easy ◦ Peak Usage - Additional resources can be requested when needed to deal with surges ◦ Large Volume and Velocity - Cloud has huge storage and very rapid data transfer rates ◦ Low initial investment - Trading capital for operational costs ◦ Globally distributed - You can put a server close to your clients http: //www. pcmag. com/article 2/0, 2817, 2372163, 00. asp