Three Big Stages in Machine Learning Pipeline Collection

Three Big Stages in Machine Learning Pipeline Collection Pre-Processing Sampling Store Training Samples Raw Data Processing Model Optimize Feature Extract Model Evaluate Feature Selection Model Turing AI Training Model Deploy Business Requirement Cloud Deploy Edge Deploy On-device Deploy Inference Output AI Applications 11/28/2020 2

What's Adlik • • Adlik [ædlik], a toolkit for accelerating deep learning inference on specific hardware. Support several kinds of hardwares. Collaborate with existing inference solutions with unified entrance. Automatically decide optimal engineering parameters (backend、batch size etc. ) • An open source project of LFAI and code hosted on Git. Hub. https: //github. com/Adlik in LF AI landscape 11/28/2020 3

Why Using Adlik • Directly using training framework and model to do inference will be inefficient. • There are many inference frameworks for different devices and hard to select 'correct' one. • Need a lot of tuning work to meet performance requirements (latency、throughput、 resource constraints). • Migrating from one hardware to another. Tensor. Flow CPU Tensor. Flow GPU Res. Net 50 2. 03 Pcs/s VGG 16 Inception. V 3 Model Adlik Optimized Open. VINO CPU Tensor. RT GPU TF Serving CPU TF Serving GPU 246. 38 Pcs/s 8. 07 Pcs/s 1041. 41 Pcs/s 2. 39 Pcs/s 525. 49 Pcs/s 0. 95 Pcs/s 220. 14 Pcs/s 3. 42 Pcs/s 470. 69 Pcs/s 1. 21 Pictures/s 359. 67 Pcs/s 1. 85 Pcs/s 171. 55 Pcs/s / / 1. 63 Pcs/s ? ? 308. 39 Pcs/s Test bed：Batch size: 64; CPU: 2*6138@2. 0 GHz; Memory: 8 G; GPU: 1*P 100 11/28/2020 4

Adlik Architecture Model Optimizer & Compiler：boost computing efficiency, reduce power consumption and latency Model Training Model Compiler Graph Optimizer 8 bit big model fp 32 Model Export Pruning Quantization Image-based Engine + Model Structural Compression File-based Model Adlik Inference Engine Service Portal Kubernetes Docker Kubernetes Binary-file Engine GPU Cluster Storage Cluster Mgt Nodes Cloud Deployment Storage Node Layer Fusion File-based Model Adlik Inference Engine static/dynamic AI-based Programs load Inference Runtime Docker Operation System & Device Drivers x 86 GPU Graph Conversion Binary File Micro Services small model int 8 ARM GPU FPGA Mali Mgt Node Edge Deployment On-device Deployment Adlik Engine: support three kinds of deployment environment 11/28/2020 5

Adlik Features Model representation h 5 ONNX checkpoint Model Optimizer Pruning Structural Compression Quantization Model Compiler Optimization Graph Conversion Parallel Execution Optimization Computing Layer Fusion OP Combination Memory Optimization Runtime Openvino Tensor. RT TF Serving TF Lite l Supporting main stream deep learning frameworks and model formats, such as Tensorflow/Keras/Py. Torch l Supporting many hardware such as CPU（X 86/ARM） /GPU/FPGA, and plan to support ASIC l Supporting several packing format to meet cloud, edge, and on-device deployment requirement FPGA Runtime 11/28/2020 6

Model Optimizer: Pruning × 3 Channel Input Filters * 4 Maps * 4 × l Supporting multi-nodes and multi-GPU pruning and tuning. l Supporting channel pruning and filter pruning, reducing the number of parameters and flops. Res. Net-50 Top-1 Parameters Size baseline 76. 19% 25. 61 M 99 MB pruned 75. 50% 17. 43 M 67 MB Res. Net-50 MACs baseline 5. 10*107 7. 2 pcs/s pruned 3. 47*107 9. 57 pcs/s Inference speed 11/28/2020 7

Model Optimizer: Quantizing Pre-trained FP 32 Model Quantize with Calibration Quantizer INT-8 Inference l Supporting 8 -bit Calibration Quantization. l Quantizing process needs only a small batch of datasets and few minutes. Res. Net-50 Top-1 Parameters MACs Size baseline 76. 19% 25. 61 M 5. 10*107 99 MB pruned 75. 50% 17. 43 M 3. 47*107 67 MB 17. 43 M 3. 47*107 18 MB pruned+quan 75. 3% tized(TF-Lite) 11/28/2020 8

Model Optimizer& Compiler *. h 5 graph *. ckpt frozen pb Open. VINO IR model l Supporting preset model transforming route planning Tensor. RT Plan Model *. pb *. pth Tensor. Flow Saved model l Supporting several original trained model formats and target runtime formats with unified compiling request. *. onnx Tf Lite model l Easy to expand by customer. l Could support automatically model transforming route planning by policy (such as performance priority) 11/28/2020 9

Adlik Inference Engine Serving Engine Model Store model 1 V 2 V 3 model 2 V 1 V 2 l Model upload, upgrade, versioning, inference and monitoring httpd l Unified inference interface RPC server Model Manager Scheduler Runtime V 3 Tensor. Flow Open. Vino Tensor. RT TFLite ML CPU/GPU CPU/ GPU CPU l Unified management and scheduling of multi-runtime, multi-model and multi-instance l Supporting custom-defined runtime l Supporting ML runtime 11/28/2020 1 0

Adlik Serving SDK l C++ API Adlik Serving Engine Custom AI Function Serving SDK C++ API Model Manager Model Store Scheduler Custom Defined Module CPU/GPU Open. Vino CPU l Supporting custom defined Ops l Supporting model orchestration Runtime Tensor. Flow l Supporting custom defined runtime Tensor. RT TFLite ML GPU CPU/ GPU Custom Runtime CPU Adlik Defined Module l Easy for users to expand their own runtime 11/28/2020 11

Using Adlik to Deploy Models in Cloud/Edge/Device Cloud Trained Model Adlik optimize Optimized Model compile Runtime Image with compiled model deploy Edge Cloud Native Multi-model scheduling Adlik Trained Model optimize Optimized Model compile Compiled model only deploy Runtime Container on Edge Server Device Trained Model Adlik optimize Optimized Model compile model orchestration Runtime Library with model run Specific Hardware 11/28/2020 12

Use Case: Adlik used in Automatic Test • A containerized solution which could automatically execute all test steps. • Support all compilers and runtimes integrated in Adlik. • Usage scenarios: Dev. Ops, Benchmark test, etc. . Adlik Prepare Test Env model file Build runtime client script . . . code runtime type TF TRT. . . Compile model Start engine • • inference result performance: • • Tail latency Image processing efficiency Execute test 11/28/2020 13

Use Case: Adlik used in embedded device l Deploy Adlik inference engine in Jetson Nano. l Use Adlik optimizer to quantize Resnet-50, and compile it to Tf. Lite model format. l In Jetson Nano, we read test images locally and run inference test by calling Adlik inference interface. 11/28/2020 14

Use Case: ML Algorithm in Adlik Model Store model 1 V 2 V 2 Model Manager RPC server Scheduler V 3 model 2 V 1 httpd V 3 Algorithm Factory classification … regression Self-developed algorithm Linear Algebra Library l Supporting injection of custom-defined algorithms with C++ API l Supporting multiple task types: l Inference l Incremental training l Supporting multiple data transfer methods: l Messages l Files l Shared memory 11/28/2020 15

Use Case: Adlik for O-RAN 11/28/2020 16

Adlik Development Status l Released Version 0. 1. 0 (Antelope): l Model Optimizer l Multi nodes multi GPUs training and pruning. l Configurable implementation of filter pruning l Small batch dataset quantization for TF-Lite and TF-TRT. l Model Compiler l New framework. l Compilation of models trained from Keras, Tensorflow and Pytorch l Inference Engine: l Management of multi models and multi versions. l HTTP/GRPC interfaces for inference service. l Runtime scheduler that supports scheduling of multi model instances. l Integration of multiple DL inference runtime l Support ARM cpu l Benchmark Test Framework 11/28/2020 17

Adlik Development Status l Community Activity : l Routine TSC meetings. l Stable cooperation with CMCC, Unicom, AIIA l Submit CR in ORAN community, introduce Adlik into ORAN framework l Github Activity : l 150+ commits l 12 contributors l First release version v 0. 1. 0 11/28/2020 18

Adlik Development Status Initial Commit: Supporting h 5, ckpt to be compiled to Tensorflow serving, Tensor. RT, Open. Vino 2019. 9 2019. 12 Release 0. 1. 0: Supporting ARM cpu Supporting Optimizer Supporting ML; Auto Test framework Now Supporting custom-defined runtime Supporting custom-defined service core; Supporting multi-model management and scheduling; 2020. 12 Release: Support more framework Introduce Auto. ML into Adlik 2021. 6 Release: Adlik Cloud Service Support FPGA Auto engeneering parameters optimization 11/28/2020 19

谢谢 Thank You 11/28/2020 20