Efficient Video Classification Using Fewer Frames Shweta Bhardwaj

Efficient Video Classification Using Fewer Frames Shweta Bhardwaj, Mukundhan Srinivasan, Mitesh M. Khapra Indian Institute of Technology Madras, NVIDIA Bangalore

Motivation • Building compact models for video classification which have a small memory footprint • Most compact models have a large FLOPs. 2020/11/28 2

Motivation • ECCV 2018 workshop • (i) recurrent neural network (LSTM) • (ii) cluster-and-aggregate (Net. VLAD) • (iii) based on C 3 D 2020/11/28 3

Motivation • ECCV 2018 • distillation workshop • (i) recurrent neural network (LSTM) • (ii) cluster-and-aggregate (Net. VLAD) • (iii) based on C 3 D • Expensive teacher network with every frames • Inexpensive student network with fewer frames 2020/11/28 • Same model 4

Framework 2020/11/28 5

Teacher Network 2020/11/28 6

Student Network 7 2020/11/28

Framework 2020/11/28 8

Student Network 9 2020/11/28

Framework 2020/11/28 10

Student Network 11 2020/11/28

Framework 2020/11/28 12

Experiment Dataset: Youtube-8 M(2017 version) Model: H-RNN & Net. VLAD & Ne. Xt. VLAD Skyline: the original teacher model Baseline: the student model without teacher model 13 2020/11/28

Evaluation 14 2020/11/28

Best baseline & Best loss combination 2020/11/28 15

Performance of student close to the skyline 2020/11/28 16

Better performance on limited training data 2020/11/28 17

Parallel student take more epochs match the serial 2020/11/28 18

Computational cost and time is less than skyline, with the same performance 2020/11/28 19

Training student to match the intermediate representations of teacher, get the same performance 2020/11/28 20

Other models’ performance 2020/11/28 21

Conclusion • Our model outperforms the baseline and gives a significant reduction in terms of computational time and cost when compared to the skyline. • In the future, train a reinforcement learning agent to first select the most favorable k frames. Comments ++ Save lots of time and memory, also get the same results. ++ Good idea -- The performance of the best baseline is close to the model. 2020/11/28 22