Resource Management with YARN: YARN Past, Present and Future Anubhav Dhoot Software Engineer Cloudera 1 Resource Management Map Reduce Impala YARN (DYNAMIC RESOURCE MANAGEMENT) Spark YARN (Yet Another Resource Negotiator) Traditional Operating System Storage: File System Execution/ Scheduling: Processes/ Kernel Scheduler Hadoop Storage: Hadoop Distributed File System (HDFS) Execution/ Scheduling: Yet Another Resource Negotiator (YARN) Overview of Talk History of YARN • Recent features • On going features • Future • WHY YARN Traditional Distributed Execution Engines Master Client Worker Task Task Worker Task Task Worker Task Task Client MapReduce v1 (MR1) Job Tracker Client Task Tracker Map Map Task Tracker Reduce Map Task Tracker Map Reduce Client JobTracker tracks every task in the cluster! MR1 Utilization 4 GB Map 1024 MB Map 1024 MB Reduce 1024 MB Reduce 1024 MB Fixed-size slot model forces slots large enough for the biggest task! Running multiple frameworks… Master Master Master Client Client Client Client Client Client Worker Worker Worker Worker Worker Worker Worker Worker Worker Task TaskTask TaskTask Task Task Task Task TaskTask Task Task TaskTask TaskTask Task YARN to the rescue! Scalability: Track only applications, not all tasks. • Utilization: Allocate only as many resources as needed. • Multi-tenancy: Share resources between frameworks and users • • Physical resources – memory, CPU, disk, network YARN Architecture Node Manager Client Resource Manager Client App Node Manager Container Master Applications State Cluster State Node Manager MR1 to YARN/MR2 functionality mapping • JobTracker is split into • • • • ResourceManager – cluster-management, scheduling and application state handling ApplicationMaster – Handle tasks (containers) per application (e.g. MR job) JobHistoryServer – Serve MR history TaskTracker maps to NodeManager EARLY FEATURES Handing faults on Workers App Node Manager Master Client Resource Manager Client App Node Manager Container Master Applications State Cluster State Container Node Manager Master Fault-tolerance - RM Recovery App Node Manager Container Master Client Client Resource Manager Applications State Cluster State RM Store App Node Manager Container Master Master Node Fault tolerance High Availability (Active / Standby) Client Active Resource Manager Elector RM Store Client Elector Standby Resource Manager App Node Manager Master ZK Node Manager Master Node Fault tolerance High Availability (Active / Standby) Client Standby Resource Manager Elector RM Store Client Elector Active Resource Manager Node Manager ZK App Node Manager Master Scheduler • • • • • Inside ResourceManager Decides who gets to run when and where Uses “Queues” to describe organization needs Applications are submitted to a queue Two schedulers out of the box Fair Scheduler • Capacity Scheduler • Fair Scheduler Hierarchical Queues Root Mem Capacity: 12 GB CPU Capacity: 24 cores Marketing Fair Share Mem: 4 GB Fair Share CPU: 8 cores Jim’s Team Fair Share Mem: 2 GB Fair Share CPU: 4 cores R&D Fair Share Mem: 4 GB Fair Share CPU: 8 cores Bob’s Team Fair Share Mem: 2 GB Fair Share CPU: 4 cores Sales Fair Share Mem: 4 GB Fair Share CPU: 8 cores Fair Scheduler Queue Placement Policies <queuePlacementPolicy> <rule name="specified" /> <rule name="primaryGroup" create="false" /> <rule name="default" /> </queuePlacementPolicy> Multi-Resource Scheduling ● Node capacities expressed in both memory and CPU ● Memory in MB and CPU in terms of vcores ● Scheduler uses dominant resource for making decisions Multi-Resource Scheduling 12 GB 33% cap. 3 cores 25% cap. Queue 1 Usage 10 GB 28% cap. 6 cores 50% cap. Queue 2 Usage Multi-Resource Enforcement ● YARN kills containers that use too much memory ● CGroups for limiting CPU RECENTLY ADDED FEATURES RM recovery without losing work Preserving running containers on RM restart • NM no longer kills containers on resync • AM made to register on resync with RM • RM recovery without losing work Node Manager Client Client Resource Manager Applications State Cluster State RM Store App Node Manager Container Master NM Recovery without losing work NM stores container and its associated state in a local store • On restart reconstruct state from store • Default implementation using LevelDB • Supports rolling restarts with no user impact • NM Recovery without losing work Node Manager Client Resource Manager Client Applications State Cluster State App Master Container State Store Fair Scheduler Dynamic User Queues Root Mem Capacity: 12 GB CPU Capacity: 24 cores Marketing Fair Share Mem: 4 GB Fair Share CPU: 8 cores Moe 4 GB Fair Share Mem: 2 8 cores Fair Share CPU: 4 R&D Fair Share Mem: 4 GB Fair Share CPU: 8 cores Larry Fair Share Mem: 2 GB Fair Share CPU: 4 cores Sales Fair Share Mem: 4 GB Fair Share CPU: 8 cores ON GOING FEATURES Long Running Apps on Secure Clusters (YARN896) ● Update tokens of running applications ● Reset AM failure count to allow mulitple failures over a long time ● Need to access logs while application is running ● Need a way to show progress Application Timeline Server (YARN-321, YARN1530) ● Currently we have a JobHistoryServer for MapReduce history ● Generic history server ● Gives information even while job is running Application Timeline Server ● Store and serve generic data like when containers ran, container logs ● Apps post app-specific events o e.g. MapReduce Attempt Succeeded/Failed ● Pluggable framework-specific UIs ● Pluggable storage backend ● Default LevelDB Disk scheduling (YARN-2139 ) ● Disk as a resource in addition to CPU and Memory ● Expressed as virtual disk similar to vcore for cpu ● Dominant resource fairness can handle this on the scheduling side ● Use CGroups blkio controller for enforcement Reservation-based Scheduling (YARN-1051) Reservation-based Scheduling FUTURE FEATURES Container Resizing (YARN-1197) ● Change container’s resource allocation ● Very useful for frameworks like Spark that schedule multiple tasks within a container ● Follow same paths as for acquiring and releasing containers Admin labels (YARN-796) ● Admin tags nodes with labels (e.g. GPU) ● Applications can include labels in container requests I want a GPU NodeManager [Windows] Application Master NodeManager [GPU, beefy] Container Delegation (YARN-1488) ● Problem: single process wants to run work on behalf of multiple users. ● Want to count resources used against users that use them. ● E.g. Impala or HDFS caching Container Delegation (YARN-1488) ● Solution: let apps “delegate” their containers to other containers on the same node. ● Delegated container never runs ● Framework container gets its resources ● Framework container responsible for fairness within itself Questions? Thank You! Anubhav Dhoot, Software Engineer, Cloudera [email protected] 43
© Copyright 2025