EVA 2 Exploiting Temporal Redundancy In Live Computer

  • Slides: 63
Download presentation
EVA 2: Exploiting Temporal Redundancy In Live Computer Vision Mark Buckler, Philip Bedoukian, Suren

EVA 2: Exploiting Temporal Redundancy In Live Computer Vision Mark Buckler, Philip Bedoukian, Suren Jayasuriya, Adrian Sampson International Symposium on Computer Architecture (ISCA) Tuesday June 5, 2018

Convolutional Neural Networks (CNNs) 2

Convolutional Neural Networks (CNNs) 2

Convolutional Neural Networks (CNNs) 3

Convolutional Neural Networks (CNNs) 3

FPGA Research Suda et al. Embedded Vision Accelerators Zhang et al. Shi. Dian. Nao

FPGA Research Suda et al. Embedded Vision Accelerators Zhang et al. Shi. Dian. Nao Qiu et al. Farabet et al. Many more… ASIC Research EIE Industry Adoption Eyeriss SCNN Many more… 4

Temporal Redundancy Input Change Frame 0 Frame 1 Frame 2 Frame 3 High Low

Temporal Redundancy Input Change Frame 0 Frame 1 Frame 2 Frame 3 High Low Low 5

Temporal Redundancy Frame 0 Frame 1 Frame 2 Frame 3 Input Change High Low

Temporal Redundancy Frame 0 Frame 1 Frame 2 Frame 3 Input Change High Low Low Cost to Process High 6

Temporal Redundancy Frame 0 Frame 1 Frame 2 Frame 3 Input Change High Low

Temporal Redundancy Frame 0 Frame 1 Frame 2 Frame 3 Input Change High Low Low Cost to Process High Low 7

Talk Overview Background Algorithm Hardware Evaluation Conclusion 8

Talk Overview Background Algorithm Hardware Evaluation Conclusion 8

Talk Overview Background Algorithm Hardware Evaluation Conclusion 9

Talk Overview Background Algorithm Hardware Evaluation Conclusion 9

Common Structure in CNNs Image Classification Object Detection Semantic Segmentation Image Captioning 10

Common Structure in CNNs Image Classification Object Detection Semantic Segmentation Image Captioning 10

Common Structure in CNNs Intermediate Activations Frame 0 Frame 1 #Make. Ryan. Gosling. The.

Common Structure in CNNs Intermediate Activations Frame 0 Frame 1 #Make. Ryan. Gosling. The. New. Lenna CNN Prefix CNN Suffix High energy Low energy 11

Common Structure in CNNs Intermediate Activations CNN Prefix CNN Suffix High energy Low energy

Common Structure in CNNs Intermediate Activations CNN Prefix CNN Suffix High energy Low energy “Key Frame” Motion “Predicted Frame” #Make. Ryan. Gosling. The. New. Lenna Motion CNN Prefix CNN Suffix High energy Low energy 12

Common Structure in CNNs Intermediate Activations CNN Prefix CNN Suffix High energy Low energy

Common Structure in CNNs Intermediate Activations CNN Prefix CNN Suffix High energy Low energy “Key Frame” Motion “Predicted Frame” CNN Prefix Motion CNN Suffix Low energy #Make. Ryan. Gosling. The. New. Lenna 13

Talk Overview Background Algorithm Hardware Evaluation Conclusion 14

Talk Overview Background Algorithm Hardware Evaluation Conclusion 14

Activation Motion Compensation (AMC) Time Vision Computation Input Frame Vision Result Stored Activations Key

Activation Motion Compensation (AMC) Time Vision Computation Input Frame Vision Result Stored Activations Key Frame t CNN Prefix Predicted Frame t+k Motion Estimation CNN Suffix Motion Compensation Motion Vector Field CNN Suffix Predicted Activations 15

Activation Motion Compensation (AMC) Time Vision Computation Input Frame Vision Result Stored Activations Key

Activation Motion Compensation (AMC) Time Vision Computation Input Frame Vision Result Stored Activations Key Frame t CNN Prefix CNN Suffix ~1011 MACs Predicted Frame t+k Motion Estimation ~107 Adds Motion Compensation Motion Vector Field CNN Suffix Predicted Activations 16

AMC Design Decisions • How to perform motion estimation? • How to perform motion

AMC Design Decisions • How to perform motion estimation? • How to perform motion compensation? • Which frames are key frames? 17

AMC Design Decisions • How to perform motion estimation? • How to perform motion

AMC Design Decisions • How to perform motion estimation? • How to perform motion compensation? • Which frames are key frames? 18

AMC Design Decisions • How to perform motion estimation? • How to perform motion

AMC Design Decisions • How to perform motion estimation? • How to perform motion compensation? • Which frames are key frames? 19

AMC Design Decisions • How to perform motion estimation? • How to perform motion

AMC Design Decisions • How to perform motion estimation? • How to perform motion compensation? • Which frames are key frames? ? 20

AMC Design Decisions • How to perform motion estimation? • How to perform motion

AMC Design Decisions • How to perform motion estimation? • How to perform motion compensation? • Which frames are key frames? 21

Motion Estimation • We need to estimate the motion of activations by using pixels…

Motion Estimation • We need to estimate the motion of activations by using pixels… CNN Prefix CNN Suffix Motion Estimation Motion Compensation Performed on Pixels Performed on Activations CNN Suffix 22

Pixels to Activations Input Image 3 x 3 Conv Intermediate 64 Activations 23

Pixels to Activations Input Image 3 x 3 Conv Intermediate 64 Activations 23

Pixels to Activations: Receptive Fields C=64 C=3 C=64 w=h=8 Input Image 3 x 3

Pixels to Activations: Receptive Fields C=64 C=3 C=64 w=h=8 Input Image 3 x 3 Conv Intermediate 64 Activations 24

Pixels to Activations: Receptive Fields C=64 C=3 C=64 w=h=8 5 x 5 “Receptive Field”

Pixels to Activations: Receptive Fields C=64 C=3 C=64 w=h=8 5 x 5 “Receptive Field” Input Image 3 x 3 Conv Intermediate 64 Activations • Estimate motion of activations by estimating motion of receptive fields 25

Receptive Field Block Motion Estimation (RFBME) … … Key Frame Predicted Frame 26

Receptive Field Block Motion Estimation (RFBME) … … Key Frame Predicted Frame 26

Receptive Field Block Motion Estimation (RFBME) 0 1 2 3 Key Frame Predicted Frame

Receptive Field Block Motion Estimation (RFBME) 0 1 2 3 Key Frame Predicted Frame 27

Receptive Field Block Motion Estimation (RFBME) 0 1 2 3 Key Frame Predicted Frame

Receptive Field Block Motion Estimation (RFBME) 0 1 2 3 Key Frame Predicted Frame 28

AMC Design Decisions • How to perform motion estimation? • How to perform motion

AMC Design Decisions • How to perform motion estimation? • How to perform motion compensation? • Which frames are key frames? 29

Motion Compensation C=64 Vector: X = 2. 5 Y = 2. 5 Stored Activations

Motion Compensation C=64 Vector: X = 2. 5 Y = 2. 5 Stored Activations Predicted Activations • Subtract the vector to index into the stored activations • Interpolate when necessary 30

AMC Design Decisions • How to perform motion estimation? • How to perform motion

AMC Design Decisions • How to perform motion estimation? • How to perform motion compensation? • Which frames are key frames? ? 31

When to Compute Key Frame? • System needs a new key frame when motion

When to Compute Key Frame? • System needs a new key frame when motion estimation fails: • • De-occlusion New objects Rotation/scaling Lighting changes 32

When to Compute Key Frame? Input Frame • System needs a new key frame

When to Compute Key Frame? Input Frame • System needs a new key frame when motion estimation fails: • • De-occlusion New objects Rotation/scaling Lighting changes • So, compute key frame when RFBME error exceeds set threshold Key Frame Motion Estimation Yes CNN Prefix Error > Thresh? No Motion Compensation CNN Suffix Vision Result 33

Talk Overview Background Algorithm Hardware Evaluation Conclusion 34

Talk Overview Background Algorithm Hardware Evaluation Conclusion 34

Embedded Vision Accelerator Global Buffer Eyeriss (Conv) EIE (Full Connect) CNN Prefix Y. -H.

Embedded Vision Accelerator Global Buffer Eyeriss (Conv) EIE (Full Connect) CNN Prefix Y. -H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks, ” CNN Suffix S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “EIE: Efficient inference engine on compressed deep neural network, ” 35

Embedded Vision Accelerator (EVA 2) Global Buffer EVA 2 Motion Estimation Eyeriss (Conv) Motion

Embedded Vision Accelerator (EVA 2) Global Buffer EVA 2 Motion Estimation Eyeriss (Conv) Motion Compensation Y. -H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks, ” EIE (Full Connect) CNN Prefix CNN Suffix S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “EIE: Efficient inference engine on compressed deep neural network, ” 36

Embedded Vision Accelerator (EVA 2) Frame 0 37

Embedded Vision Accelerator (EVA 2) Frame 0 37

Embedded Vision Accelerator (EVA 2) Frame 0: Key frame 38

Embedded Vision Accelerator (EVA 2) Frame 0: Key frame 38

Embedded Vision Accelerator (EVA 2) Frame 1 Motion Estimation 39

Embedded Vision Accelerator (EVA 2) Frame 1 Motion Estimation 39

Embedded Vision Accelerator (EVA 2) Frame 1: Predicted frame Motion Estimation Motion Compensation •

Embedded Vision Accelerator (EVA 2) Frame 1: Predicted frame Motion Estimation Motion Compensation • EVA 2 leverages sparse techniques to save 80 -87% storage and computation 40

Talk Overview Background Algorithm Hardware Evaluation Conclusion 41

Talk Overview Background Algorithm Hardware Evaluation Conclusion 41

Evaluation Details Train/Validation Datasets Evaluated Networks Hardware Baseline EVA 2 Implementation You. Tube Bounding

Evaluation Details Train/Validation Datasets Evaluated Networks Hardware Baseline EVA 2 Implementation You. Tube Bounding Box: Object Detection & Classification Alex. Net, Faster R-CNN with VGGM and VGG 16 Eyeriss & EIE performance scaled from papers Written in RTL, synthesized with 65 nm TSMC 42

EVA 2 Area Overhead Total 65 nm area: 74 mm 2 EVA 2 takes

EVA 2 Area Overhead Total 65 nm area: 74 mm 2 EVA 2 takes up only 3. 3% 43

Normalized Energy EVA 2 Energy Savings 1 0. 9 0. 8 0. 7 0.

Normalized Energy EVA 2 Energy Savings 1 0. 9 0. 8 0. 7 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 0 Input Frame CNN Prefix CNN Suffix orig Alex. Net Eyeriss orig Faster 16 EIE orig Vision Result Faster. M EVA^2 44

Normalized Energy EVA 2 Energy Savings 1 0. 9 0. 8 0. 7 0.

Normalized Energy EVA 2 Energy Savings 1 0. 9 0. 8 0. 7 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 0 Input Frame Key Frame Motion Estimation Motion Compensation orig pred Alex. Net Eyeriss orig pred Faster 16 EIE EVA^2 orig pred Faster. M CNN Suffix Vision Result 45

Normalized Energy EVA 2 Energy Savings 1 0. 9 0. 8 0. 7 0.

Normalized Energy EVA 2 Energy Savings 1 0. 9 0. 8 0. 7 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 0 Input Frame Key Frame Motion Estimation Yes CNN Prefix orig pred avg Alex. Net Eyeriss Faster 16 EIE EVA^2 Faster. M Error > Thresh? No Motion Compensation CNN Suffix Vision Result 46

High Level EVA 2 Results Network Vision Task Keyframe % Accuracy Degredation Average Latency

High Level EVA 2 Results Network Vision Task Keyframe % Accuracy Degredation Average Latency Average Energy Savings Alex. Net Classification 11% 0. 8% top-1 86. 9% 87. 5% Faster R-CNN VGG 16 Detection 36% 0. 7% m. AP 61. 7% 61. 9% Faster R-CNN VGGM 37% 0. 6% m. AP 54. 1% 54. 7% Detection • EVA 2 enables 54 -87% savings while incurring <1% accuracy degradation • Adaptive key frame choice metric can be adjusted 47

Talk Overview Background Algorithm Hardware Evaluation Conclusion 48

Talk Overview Background Algorithm Hardware Evaluation Conclusion 48

Conclusion • Temporal redundancy is an entirely new dimension for optimization • AMC &

Conclusion • Temporal redundancy is an entirely new dimension for optimization • AMC & EVA 2 improve efficiency and are highly general • Applicable to many different… • CNN applications (classification, detection, segmentation, etc) • Hardware architectures (CPU, GPU, ASIC, etc) • Motion estimation/compensation algorithms 49

EVA 2: Exploiting Temporal Redundancy In Live Computer Vision Mark Buckler, Philip Bedoukian, Suren

EVA 2: Exploiting Temporal Redundancy In Live Computer Vision Mark Buckler, Philip Bedoukian, Suren Jayasuriya, Adrian Sampson International Symposium on Computer Architecture (ISCA) Tuesday June 5, 2018

Backup Slides 51

Backup Slides 51

Why not use vectors from video codec/ISP? • We’ve demonstrated that the ISP can

Why not use vectors from video codec/ISP? • We’ve demonstrated that the ISP can be skipped (Bucker et al. 2017) • No need to compress video which is instantly thrown away • Can save energy by power gating the ISP • Opportunity to set own key frame schedule • However, great idea for pre-stored video! 52

Why Not Simply Subsample? • If lower frame rate needed, simply apply AMC at

Why Not Simply Subsample? • If lower frame rate needed, simply apply AMC at that frame rate • Warping • Adaptive key frame choice 53

Different Motion Estimation Methods Faster 16 Faster. M 54

Different Motion Estimation Methods Faster 16 Faster. M 54

Difference from Deep Feature Flow? • Deep Feature Flow does also exploit temporal redundancy,

Difference from Deep Feature Flow? • Deep Feature Flow does also exploit temporal redundancy, but… AMC and EVA 2 Adaptive key frame rate? On chip activation cache? Learned motion estimation? Yes No Motion estimation granularity Per receptive field Motion compensation Sparse (four-way zero skip) Activation storage Sparse (run length) Deep Feature Flow No No Yes Per pixel (excess granularity) Dense 55

Difference from Euphrates? • Euphrates has a strong focus on So. C integration •

Difference from Euphrates? • Euphrates has a strong focus on So. C integration • Motion estimation from ISP • May want to skip the ISP to save energy & create more optimal key schedule • Motion compensation on bounding boxes • Skips entire network, but is only applicable to object detection 56

Re-use Tiles in RFBME 57

Re-use Tiles in RFBME 57

Changing Error Threshold 58

Changing Error Threshold 58

Different Adaptive Key Frame Metrics 59

Different Adaptive Key Frame Metrics 59

Normalized Latency & Energy 60

Normalized Latency & Energy 60

How about Re-Training? 61

How about Re-Training? 61

Where to cut the network? 62

Where to cut the network? 62

#Make. Ryan. Gosling. The. New. Lenna • Lenna dates back to 1973 • We

#Make. Ryan. Gosling. The. New. Lenna • Lenna dates back to 1973 • We need a new test image for image processing!