OBJECT DETECTION KAICHEN WANG OUTLINE 1 Background RCNN

OBJECT DETECTION KAI-CHEN WANG

OUTLINE 1. Background – R-CNN – Fast-RCNN – Faster-RCNN 2. Paper research – Fully Convolutional Instance-aware Semantic Segmentation – Mask R-CNN 3. Reference

R-CNN [Rich feature hierarchies for accurate object detection and semantic segmentation Tech report (v 5)] R-CNN Localization Feature extraction Classification Selective search CNN SVM

SELECTIVE SEARCH 1. Color similarity 2. Texture similarity 3. Size similarity 4. Fill similarity

R-CNN FLOW • Region proposal Feature extraction SVM 1. Region proposal : Region warping 2. Feature extraction : 4096 -d feature extraction 3. SVM : Classifying region

OUTLINE 1. Background – R-CNN – Faster-RCNN 2. Paper research – Fully Convolutional Instance-aware Semantic Segmentation – Mask R-CNN 3. Reference

FAST R-CNN • Weight sharing • ROI pooling • Multi-task

FAST R-CNN

FAST R-CNN [Fast R-CNN]

WEIGHT SHARING •

[Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - 1 1 Feb 2016]

ROI POOLING w/W • Sub-window : W*H • Rounding • Max-pooling h/H h w

MULTI-TASK LOSS •

MULTI-TASK TRAINING

OUTLINE 1. Background – R-CNN – Fast-RCNN – Faster-RCNN 2. Paper research – Fully Convolutional Instance-aware Semantic Segmentation – Mask R-CNN 3. Reference

FASTER-RCNN • Motivation : Region proposal step consumes much running time. • Solution : Region proposal network (RPN)

Image Conv layers RPN ROI pooling Fast-RCNN FC layers Bbox Reg Softmax

RPN • Pros : Sharing computation with a Fast R-CNN object detection network. [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks]

ANCHOR [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks]

LOSS FUNCTION •

MULTI-SCALE ANCHORS • Key component of sharing features: – Image resizing [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks]

4 -STEP ALTERNATING TRAINING Image. Netpre-trained Initializing training Image. Netpre-trained RPN initializing proposal Fast R-CNN Initializing RPN 2 with Fast R-CNN parameters RPN 2 RPN+Fast-RCNN (Unified network) Fix the shared conv. layer, fine-tune the layer unique to RPN Fix the shared conv. layer, finetune the layer unique to Fast RCNN

HISTORY OF EVOLUTION implementation R-CNN Faster RCNN 1. 2. 3. 4. SS for region proposal CNN feature extraction SVM classification Bounding box regression 1. 2. 3. 4. 5. SS for region proposal CNN feature extraction Weight sharing Softmax classification Multitask loss function 1. 2. 3. 4. RPN for region proposal CNN feature extraction Softmax classification Multitask loss function

OUTLINE 1. Background – R-CNN – Fast-RCNN – Faster-RCNN 2. Paper research – Fully Convolutional Instance-aware Semantic Segmentation – Mask R-CNN 3. Reference

FULLY CONVOLUTIONAL INSTANCE-AWARE SEMANTIC SEGMENTATION 1. Translation variant property : Score maps 2. Joint mask prediction & classification : end-to-end training

FULLY CONVOLUTIONAL INSTANCE-AWARE SEMANTIC SEGMENTATION 1. Translation invariant property : Score maps 2. Joint mask prediction & classification : end-to-end training [Fully Convolutional Instance-aware Semantic Segmentation]

POSITIVE-SENSITIVE SCORE MAP •

TRAINING •

INFERENCE • ROI classification scores & foreground mask NMS (IOU>0. 3) Remaining ROI • Remaining ROI : – Highest classification score – Mask voting • Mask voting – For an specific ROI, find all ROI with IOU>0. 5 – foreground mask : per-pixel average with weighted by classification scores

INFERENCE • ROI classification scores & foreground mask NMS Remaining ROI • Remaining ROI : – Highest classification score – Mask voting • Mask voting – For an specific ROI, find all ROI with IOU>0. 5 – foreground mask : per-pixel average with weighted by classification scores

OVERALL STRUCTURE 1. Mask prediction: – Pixel-wise softmax 2. Mask classification : – Pixel-wise max average 3. For every ROI we get : – (C+1)-dim mask prediction – (C+1)-dim mask classification (score)

PERFORMANCE

OUTLINE 1. Background – R-CNN – Fast-RCNN – Faster-RCNN 2. Paper research – Fully Convolutional Instance-aware Semantic Segmentation – Mask R-CNN 3. Reference

MASK R-CNN • Architecture : Faster R-CNN + mask prediction branch • Improvements : – ROI alignments : m*m floating point matrix – Decoupling : Decouples mask and class prediction – Binary masks for each class w/o competition among classes

MASK R-CNN • Architecture : Faster R-CNN + mask prediction branch • Improvements : – ROI alignments : m*m floating point matrix – Decoupling : Decouples mask and class prediction – Binary masks for each mask w/o competition among classes [Mask R-CNN]

ROIALIGN • ROI-Pooling : m*m integer matrix with rounding – Misalignments between ROI and extracted feature • Ro. IAlign : – m*m floating point matrix – Bilinear interpolation to compute exact 4 regularly sampled location

BRANCHES •

LOSS FUNCTION •

Image TRAINING Conv layers • RPN +ROI Mask branch ROIAlign FC layers FCN K masks with m*m Bbox Reg Softmax

TRAINING +ROI • Class 2 G. T +ROI

INFERENCE • NMS 100 highest detection boxes K masks each ROI • Classification branch predict k-th mask. • m*m – floating points resized to ROI size.

INFERENCE • NMS 100 highest detection boxes K masks each ROI • Classification branch predict k-th mask. K masks with m*m • m*m – floating points resized to ROI size. Classification branch Resize

[Mask R-CNN]

OUTLINE 1. Background – R-CNN – Fast-RCNN – Faster-RCNN 2. Paper research – Fully Convolutional Instance-aware Semantic Segmentation – Mask R-CNN 3. Reference

REFERENCE [1] J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders. Selective search for object recognition. IJCV, 2013. [2] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation, ” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014. [3] R. Girshick, “Fast R-CNN, ” in IEEE International Conference on Computer Vision (ICCV), 2015. [4] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS, 2015. [5] J. Dai, K. He, Y. Li, S. Ren, and J. Sun. Instance-sensitive fully convolutional networks. In ECCV, 2016. [6] K. He, G. Gkioxari, P. Doll´ar, and R. Girshick. Mask R-CNN