OBJECT DETECTION KAICHEN WANG OUTLINE 1 Background RCNN
OBJECT DETECTION KAI-CHEN WANG
OUTLINE 1. Background – R-CNN – Fast-RCNN – Faster-RCNN 2. Paper research – Fully Convolutional Instance-aware Semantic Segmentation – Mask R-CNN 3. Reference
OUTLINE 1. Background – R-CNN – Fast-RCNN – Faster-RCNN 2. Paper research – Fully Convolutional Instance-aware Semantic Segmentation – Mask R-CNN 3. Reference
R-CNN [Rich feature hierarchies for accurate object detection and semantic segmentation Tech report (v 5)] R-CNN Localization Feature extraction Classification Selective search CNN SVM
SELECTIVE SEARCH 1. Color similarity 2. Texture similarity 3. Size similarity 4. Fill similarity
R-CNN FLOW • Region proposal Feature extraction SVM 1. Region proposal : Region warping 2. Feature extraction : 4096 -d feature extraction 3. SVM : Classifying region
OUTLINE 1. Background – R-CNN – Faster-RCNN 2. Paper research – Fully Convolutional Instance-aware Semantic Segmentation – Mask R-CNN 3. Reference
FAST R-CNN • Weight sharing • ROI pooling • Multi-task
FAST R-CNN
FAST R-CNN [Fast R-CNN]
WEIGHT SHARING •
[Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 8 - 1 1 Feb 2016]
ROI POOLING w/W • Sub-window : W*H • Rounding • Max-pooling h/H h w
MULTI-TASK LOSS •
MULTI-TASK TRAINING
OUTLINE 1. Background – R-CNN – Fast-RCNN – Faster-RCNN 2. Paper research – Fully Convolutional Instance-aware Semantic Segmentation – Mask R-CNN 3. Reference
FASTER-RCNN • Motivation : Region proposal step consumes much running time. • Solution : Region proposal network (RPN)
Image Conv layers RPN ROI pooling Fast-RCNN FC layers Bbox Reg Softmax
RPN • Pros : Sharing computation with a Fast R-CNN object detection network. [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks]
ANCHOR [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks]
ANCHOR [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks]
LOSS FUNCTION •
LOSS FUNCTION •
MULTI-SCALE ANCHORS • Key component of sharing features: – Image resizing [Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks]
4 -STEP ALTERNATING TRAINING Image. Netpre-trained Initializing training Image. Netpre-trained RPN initializing proposal Fast R-CNN Initializing RPN 2 with Fast R-CNN parameters RPN 2 RPN+Fast-RCNN (Unified network) Fix the shared conv. layer, fine-tune the layer unique to RPN Fix the shared conv. layer, finetune the layer unique to Fast RCNN
HISTORY OF EVOLUTION implementation R-CNN Faster RCNN 1. 2. 3. 4. SS for region proposal CNN feature extraction SVM classification Bounding box regression 1. 2. 3. 4. 5. SS for region proposal CNN feature extraction Weight sharing Softmax classification Multitask loss function 1. 2. 3. 4. RPN for region proposal CNN feature extraction Softmax classification Multitask loss function
OUTLINE 1. Background – R-CNN – Fast-RCNN – Faster-RCNN 2. Paper research – Fully Convolutional Instance-aware Semantic Segmentation – Mask R-CNN 3. Reference
FULLY CONVOLUTIONAL INSTANCE-AWARE SEMANTIC SEGMENTATION 1. Translation variant property : Score maps 2. Joint mask prediction & classification : end-to-end training
FULLY CONVOLUTIONAL INSTANCE-AWARE SEMANTIC SEGMENTATION 1. Translation invariant property : Score maps 2. Joint mask prediction & classification : end-to-end training [Fully Convolutional Instance-aware Semantic Segmentation]
POSITIVE-SENSITIVE SCORE MAP •
TRAINING •
INFERENCE • ROI classification scores & foreground mask NMS (IOU>0. 3) Remaining ROI • Remaining ROI : – Highest classification score – Mask voting • Mask voting – For an specific ROI, find all ROI with IOU>0. 5 – foreground mask : per-pixel average with weighted by classification scores
INFERENCE • ROI classification scores & foreground mask NMS Remaining ROI • Remaining ROI : – Highest classification score – Mask voting • Mask voting – For an specific ROI, find all ROI with IOU>0. 5 – foreground mask : per-pixel average with weighted by classification scores
INFERENCE • ROI classification scores & foreground mask NMS Remaining ROI • Remaining ROI : – Highest classification score – Mask voting • Mask voting – For an specific ROI, find all ROI with IOU>0. 5 – foreground mask : per-pixel average with weighted by classification scores
OVERALL STRUCTURE 1. Mask prediction: – Pixel-wise softmax 2. Mask classification : – Pixel-wise max average 3. For every ROI we get : – (C+1)-dim mask prediction – (C+1)-dim mask classification (score)
OVERALL STRUCTURE 1. Mask prediction: – Pixel-wise softmax 2. Mask classification : – Pixel-wise max average 3. For every ROI we get : – (C+1)-dim mask prediction – (C+1)-dim mask classification (score)
PERFORMANCE
PERFORMANCE
OUTLINE 1. Background – R-CNN – Fast-RCNN – Faster-RCNN 2. Paper research – Fully Convolutional Instance-aware Semantic Segmentation – Mask R-CNN 3. Reference
MASK R-CNN • Architecture : Faster R-CNN + mask prediction branch • Improvements : – ROI alignments : m*m floating point matrix – Decoupling : Decouples mask and class prediction – Binary masks for each class w/o competition among classes
MASK R-CNN • Architecture : Faster R-CNN + mask prediction branch • Improvements : – ROI alignments : m*m floating point matrix – Decoupling : Decouples mask and class prediction – Binary masks for each mask w/o competition among classes [Mask R-CNN]
ROIALIGN • ROI-Pooling : m*m integer matrix with rounding – Misalignments between ROI and extracted feature • Ro. IAlign : – m*m floating point matrix – Bilinear interpolation to compute exact 4 regularly sampled location
BRANCHES •
LOSS FUNCTION •
Image TRAINING Conv layers • RPN +ROI Mask branch ROIAlign FC layers FCN K masks with m*m Bbox Reg Softmax
TRAINING +ROI • Class 2 G. T +ROI
INFERENCE • NMS 100 highest detection boxes K masks each ROI • Classification branch predict k-th mask. • m*m – floating points resized to ROI size.
INFERENCE • NMS 100 highest detection boxes K masks each ROI • Classification branch predict k-th mask. K masks with m*m • m*m – floating points resized to ROI size. Classification branch Resize
[Mask R-CNN]
[Mask R-CNN]
OUTLINE 1. Background – R-CNN – Fast-RCNN – Faster-RCNN 2. Paper research – Fully Convolutional Instance-aware Semantic Segmentation – Mask R-CNN 3. Reference
REFERENCE [1] J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders. Selective search for object recognition. IJCV, 2013. [2] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation, ” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014. [3] R. Girshick, “Fast R-CNN, ” in IEEE International Conference on Computer Vision (ICCV), 2015. [4] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS, 2015. [5] J. Dai, K. He, Y. Li, S. Ren, and J. Sun. Instance-sensitive fully convolutional networks. In ECCV, 2016. [6] K. He, G. Gkioxari, P. Doll´ar, and R. Girshick. Mask R-CNN
- Slides: 52