Mobi Com 2018 Motivation Today mobile vision systems

Mobi. Com 2018

Motivation • Today, mobile vision systems are ubiquitous - Autonomous Vehicles, Smart Phones, Drones, AR/VR Headsets • The emergence of AI chipsets enables the on-device deep learning on mobile vision systems - Apple a 11 Bonic Neural Engine, Intel Movidus VPU, Qualcomm XR 1 VR Chip

Motivation To enable on-device deep learning , one of the common techniques is compressing the model The model is predetermined based on a static resource budget and is fixed after the application is deployed.

Challenge • Design a scheme to provide flexible resourceaccuracy trade-offs. • Select which resource-accuracy trade-off for each of the concurrently running deep learning models.

Architecture

Model Pruning

Model Pruning • A filter is important If it is able to extract feature maps that are useful to differentiate images belonging to different classes • Choose three images anc, neg, pos, anc and pos are in the same class, anc and neg are in the different classes

Model Pruning At conv 13, TRR achieves an accuracy of 89. 72%, 87. 40% when 50%, 90% filter are pr L 1 -Norm achieves and accuracy of 75. 45 and 42. 65%

Model Recovery

Multi-Capacity Model • One Compact Model with Multiple Capabilities. • Optimized Resource-Accuracy Trade-offs. • Efficient Model Switching.

Resource-Aware Scheduler • Trade off between accuracy and latency. v is a vision application V is the set of vision applications A_min(v) is the minimum inference accuracy goal L_max(v) is the maximum processing latency goal L(m_v) is the processing latency of m_v when 100% computing resources are allocated to v a is the penalty parameter

Resource-Aware Scheduler • Min. Total. Cost For Min. Total. Cost, we allocate ∆u to the descendent model m_v of application v such that C(m_v, ∆u, v) has the smallest cost increase among other concurrent applications.

Resource-Aware Scheduler • Min. Max. Cost For Min. Max. Cost, we select application v with the highest cost C(m_v, u_v, v), and allocate ∆u to v and choose the optimal descendent model m_v =argmin C(m_v, u_v, v) for v.

Evaluation

Evaluation Baseline: model pruning with L 1 -norm

Evaluation