University of Nevada Reno Swift Machine Learning Model

  • Slides: 15
Download presentation
University of Nevada, Reno Swift Machine Learning Model Serving Scheduling: A Region Based Reinforcement

University of Nevada, Reno Swift Machine Learning Model Serving Scheduling: A Region Based Reinforcement Learning Approach Heyang Qin*, Syed Zawad*, Yanqi Zhou**, Lei Yang*, Dongfang Zhao*, Feng Yan* *University of Nevada, Reno; **Google Brain Pronghorn 1

University of Nevada, Reno Machine Learning Vision Speech Natural Language Infeasible 2

University of Nevada, Reno Machine Learning Vision Speech Natural Language Infeasible 2

University of Nevada, Reno Machine Learning Vision Cloud Service Providers Speech Natural Language 3

University of Nevada, Reno Machine Learning Vision Cloud Service Providers Speech Natural Language 3

University of Nevada, Reno Machine Learning as a Service Building Image from Amazon Web

University of Nevada, Reno Machine Learning as a Service Building Image from Amazon Web Services Training Serving 4

University of Nevada, Reno Machine Learning as a Service Clients Machine Learning Models Servers

University of Nevada, Reno Machine Learning as a Service Clients Machine Learning Models Servers “Cat” API Machine Learning Model is a callable API “Hello” “What’s the weather? ” “Rainy” Image from Azure. ML 5

University of Nevada, Reno Latency Potential Solution: More Servers Clients Longer queue Larger Latency

University of Nevada, Reno Latency Potential Solution: More Servers Clients Longer queue Larger Latency Less Latency More Cost 6

University of Nevada, Reno Latency Scenario 1: Reduce Latency Computation Cost Service-Level Objective Scenario

University of Nevada, Reno Latency Scenario 1: Reduce Latency Computation Cost Service-Level Objective Scenario 2: Reduce Cost within SLO Constraint Computation Power Computation Cost Image from Apple SLO Latency 7

University of Nevada, Reno Parallelism in MLaa. S Request parallelism Inter-op parallelism Request Intra-op

University of Nevada, Reno Parallelism in MLaa. S Request parallelism Inter-op parallelism Request Intra-op parallelism Operation Request Op. Op. Th. Request Operation Thread [OSDI ’ 16] Tensorflow: A System for Large Scale Machine Learning [Co. RR ’ 15] MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems 8

University of Nevada, Reno Admission Policy in MLaa. S Batch Size Requests Parallel Batch

University of Nevada, Reno Admission Policy in MLaa. S Batch Size Requests Parallel Batch Threads Batch timeout [OSDI ’ 16] Tensorflow: A System for Large Scale Machine Learning [Co. RR ’ 15] MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems 9

University of Nevada, Reno Our problem How do we find the best admission policy

University of Nevada, Reno Our problem How do we find the best admission policy and parallelism configuration? Scenario 1: Reduce Latency Scenario 2: Reduce Cost within SLO Constraint 10

University of Nevada, Reno Scheduling System Configuration Runtime Information MLaa. S Server Client 11

University of Nevada, Reno Scheduling System Configuration Runtime Information MLaa. S Server Client 11

University of Nevada, Reno Related Work Model Based Most have assumption on workloads/applications v

University of Nevada, Reno Related Work Model Based Most have assumption on workloads/applications v [SC ‘ 16] SERF: efficient scheduling for fast deep neural network serving via judicious parallelism v [INFOCOM ’ 17] Adaptive scheduling of parallel jobs in spark streaming. Too complicated for close form solution 12

University of Nevada, Reno Related Work Model Free Most are heuristic or learning based.

University of Nevada, Reno Related Work Model Free Most are heuristic or learning based. v [NSDI ‘ 17] Cherry. Pick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics. v [ASPLOS ‘ 15] Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services. Too many free parameters Slow learning speed 13

University of Nevada, Reno Related Work Reinforcement Learning v [SC ‘ 17] CAPES: unsupervised

University of Nevada, Reno Related Work Reinforcement Learning v [SC ‘ 17] CAPES: unsupervised storage performance tuning using neural networkbased deep reinforcement learning. v [Hot. Nets ‘ 16] Resource Management with Deep Reinforcement Learning. Agent Next State Reward Action Environment Slow learning speed 14

University of Nevada, Reno Reinforcement Learning Scheduling System Client MLaa. S Server How to

University of Nevada, Reno Reinforcement Learning Scheduling System Client MLaa. S Server How to speed up the learning? Client 15