Wildlife Action Recognition using Deep Learning Weining Li, Sirnam Swetha, Dr. Mubarak Shah University of Central Florida Introduction Approach v Problem: Given a video of an animal, recognize the action that is being performed in the video using deep learning v Most existing systems are human-centric or too specific v Requires creating a dataset and a learning system for general animal action recognition I 3 D Results Accuracy Loss v A CNN for video classification [2] Fusion: I 3 D + VGG v I 3 D classification [2] v Scene semantic features (VGG) [3] (a) Placeholder For Fusion Network Results (b) Figure 1 - cheetah chasing a deer Dataset v v v 106 categories 100 videos per category 32 animals, covering airborne, aquatic, and land animals 3 -4 actions per animal Video lengths range from 0 -5 minutes Downloaded from You. Tube [1] (c) Figure 3 – example structure of the fusion network Hierarchy of Networks v The first layer of the network groups the dataset by action v The second layer separates the dataset further into individual animals (d) Figure 5 - accuracy and loss graphs for experiments with (a) I 3 D, (b) fusion network, and (c) first layer and (d) one of the networks in the second layer of the hierarchy References 1 www. youtube. com 2 Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset 3 Z. Wu, Y. Fu, Y. Jiang and L. Sigal, "Harnessing Object and Scene Semantics for Large. Scale Video Understanding, " Figure 2 - example frames from the dataset Figure 4 – example structure of the hierarchy of networks Figure 6: