Edge Assisted Realtime Object Detection for Mobile Augmented

Edge Assisted Real-time Object Detection for Mobile Augmented Reality Luyang Liu, Hongyu Li, Marco Gruteser WINLAB, Rutgers University 05/10/2019

Mobile Augmented Reality Good understanding of the 3 D geometry of the surrounding. Lack the ability to detect and classify complex objects in the real world. 2

Impossible to run high quality object detection model on mobile phone at high frame rate Model So. C RAM Mobile. Net (224× 224) (ms) Res. Net (512 x 512) (ms) Huawei P 20 Pro Hi. Silicon Kirin 970 6 GB 144 2634 One. Plus 6 Snapdragon 845/DSP 8 GB 24 1365 Samsung Galaxy S 9+ Exynos 9810 Octa 6 GB 148 1572 Google Pixel 2 Snapdragon 835 4 GB 143 1953 http: //ai-benchmark. com/ranking 3

Opportunity: Offload Vision Tasks to Edge Cloud Challenge: Long latency (around 80 ms) significantly decreases detection accuracy. AR Headset T 1 T 2 Frame Uploading Object Detection T 3 T 4 Rendering 4

Our Contribution • Low latency offloading techniques: – Dynamic Ro. I Encoding. – Parallel Streaming and Inference. – Model quantization Latency: 80 ms -> 15 ms. • On-device fast object tracking technique: – Motion Vector based Object Tracking. Latency: 15 ms -> 2. 24 ms. • Adaptive Offloading technique. • We achieve 60 fps high quality object detection on mobile devices on 1280 x 720 resolution video fames. 5

Dynamic Ro. I Encoding • => Decrease detection accuracy. Original Frame Low Resolution Frame Lossy Frame 6

Dynamic Ro. I Encoding Ro. I – Regions of Interest Detect Ro. Is on the last offloaded frame. Mark macroblocks that overlap with Ro. Is. Change encoding quality on the current frame. 7

Parallel Streaming and Inference … 010101010101… … 10101011011010101… Encoding Transmission Decoding Inference 8

Parallel Streaming and Inference Encoding Transmission Encoding Inference Decoding Transmission Decoding Inference Latency Reduction 9

Dependency Aware Inference Depth of Convolutional N eural Network Faster RCNN Object Detection Model Slice 1 Slice 2 Slice 3 Slice 4 10

Tensor. RT INT 8 Inference • Dynamic range and minimum precision. • Improve inference latency significantly. Inference Latency (ms) Inference latency 50 42, 1 40 29, 8 30 20 11, 6 10 0 Caffe Tensor. RT FP 32 Tensor. RT INT 8 Frame Resolution (1280 x 720) 11

Our Contribution • Low latency offloading techniques: – Dynamic Ro. I Encoding. – Parallel Streaming and Inference. – Model quantization Latency: 80 ms -> 15 ms. • On-device fast object tracking technique: – Motion Vector based Object Tracking. Latency: 15 ms -> 2. 24 ms. • Adaptive Offloading technique. 12

Motion Vector based Object Tracking • Traditional Approach: SIFT and Optical flow… Offloaded Frame • Our solution: Use the motion vector embedded in the video frame. I Current Frame P P . . . P P I P . . . Motion Vectors 13

Motion Vector based Object Tracking Cached detection result of the last offloaded frame. Motion vectors extracted from the current encoded frame. Tracking Latency (ms) 90 79, 01 80 70 60 50 40 30 20 10 2, 24 8, 53 0 Motion Vector Lucas Kanade Horn Schunck based Object Tracking Methods Shift the bounding box based on the motion vectors. 14

Adaptive Offloading Case 1 Don’t Offload Last Offloaded Frame Current Frame Case 2 Offload Last Offloaded Frame Current Frame 15

Our System Offloading Pipeline Rendering Pipeline Motion Vector based Object Tracking Cached Detection Result 16

Experiment Setup Client Server Wi. Fi 2. 4 GHz Wi. Fi 5 GHz TP-Link AC 1900 Nvidia Jetson TX 2 Magic Leap One Desktop PC with Nvidia Titan Xp GPU 17

Experiment Setup • Ten videos in the Xiph video dataset. • Two object detection tasks. Object Detection Human Keypoint Detection 18

Evaluation Result Mean Detection Accuracy (Io. U/OKS) of two models with two Wi. Fi connections. CDF of detection accuracy (Io. U/OKS) for object detection and keypoint detection task. 19

Evaluation Result Baseline Our System 20

Conclusion • Contribution: – We design a system that enables high accuracy object detection for AR/MR systems running at 60 fps. – Low latency offloading techniques. – On-device fast object tracking. • Future works: – Improve network robustness. – Improve camera capture and screen display latency. – Better server-client collaboration. 21

Thank you! Luyang Liu luyang@winlab. rutgers. edu http: //www. winlab. rutgers. edu/~luyang/

Backup Slides

Opportunity: Offload Vision Tasks to Edge Cloud Uplink Transmission Sensing Processing AR Device Downlink Transmission Object Detection 24