Ongoing research on Object Detection Some modification after

On-going research on Object Detection *Some modification after seminar Tackgeun YOU

Contents • Baseline Algorithm – Fast R-CNN • Observations & Proposals • Fast R-CNN in Microsoft COCO

Object Detection • Definition – Predict the location/label of objects in the scene • Traditional Pipeline – Approximate a search space by • Sliding Window or Object Proposals – Evaluate the approximated regions – Non-maximal suppression to get proper regions

R-CNNCVPR 14 • Object Proposals – Approximate search space • Fine-tuned CNN Feature SVM – Score each region • Bounding Box Regression – Refinement region • Non-maximal Suppression

Training Pipeline of R-CNN • Supervised Pre-training – Image-level Annotation in ILSVRC 2012 • Domain-specific Fine-tuning – Mini-batch with 128 samples – 32 Positive samples - Region proposals ≥ 0. 5 Io. U – 96 Negative samples – The rest • Object Category Classifier (SVM) – Positive – Only GT – Negative – 0. 3 ≤ Io. U – Hard Negative Mining • Bounding Box Regression – Using nearby-samples – maximum overlap in { 0. 6 ≥ Io. U } – Ridge Regression (Regularization is important) – Iteration does not improve the result

Fast R-CNNAr. Xiv 15 •

Exploring VOC with Fast R-CNN • Observation – Failed to localize contiguous objects • Hypothesis – Multiple-objects region has a higher confidence than single-object • Experiment – Check that maximum value is on tight object • MCMC iteration start from ground truth

Red – ratio(Io. U > 0. 5) Blue – mean(Io. U) Magenta – ratio(Io. U > 0. 5) Black – ratio(Io. U < 0. 3)

Hope to achieve below condition • Tailoring confidence for precise localization – Whole body of a single object (Highest) – Partial body of a single object (Positive) – Overlapped multiple object ( ? ) – Other classes (Lowest)

Detailed Plans • Dealing multiple-objects region? – How to define multiple-objects region? – Using Fast R-CNN • Fine-tuning multi-object regions as negative samples • Negative Sample on Batch – Possible Failure - Decreases the performance, while alleviates the confidence on multiple-object regions. – Adopting Proper Loss function • Ranking

Microsoft COCO • 80 -classes • Train (82783), Validation (40504) • Test (81434) Split #imgs Submission Score Reported Test-Dev ~ 20 K Unlimited Immediately Test-Standard ~ 20 K Limited Immediately Test-Challenge ~ 20 K Limited Workshop Test-Reserve ~ 20 K Limited Never

ref. Microsoft COCO: Common Objects in Context

ref. What makes for effective detection proposals?

Fast R-CNN with 1 k-MCG proposals 240 k-iters (5. 8 epoch on train)

Fast R-CNN with 1 k-MCG proposals 240 k-iters + 130 k-iters (6. 4 epoch on val)

Processing Time of Fast R-CNN • Testing Speed – With MCG @1 k - 1. 872 s/image – ~21. 06 hours @ validation set – ~10 hours @ test-dev set – ~42. 35 hours @ test set • Training Speed – 0. 564 s/iteration – ~6. 48 hours/epoch_on_training_set

End

Samples • • http: //mscoco. org/explore/? id=407286 http: //mscoco. org/explore/? id=161602 http: //mscoco. org/explore/? id=123835 http: //mscoco. org/explore/? id=242673

Label Difference in Fine-tuning & SVM • Domain-specific Fine-tuning – Mini-batch with 128 samples – 32 Positive samples - Region proposals ≥ 0. 5 Io. U – 96 Negative samples – The rest • Object Category Classifier (SVM) – Positive – Only GT – Negative – {0. 0, 0. 1, 0. 2, 0. 3, 0. 4, 0. 5} ≤ Io. U • Fitting m. AP on validation set – 0. 0 -4%, 0. 5 -5% – Hard Negative Mining (Fitting training set is impossible)

• Conjecture – The definition of positive examples used in finetuning does not emphasize precise localization. – The softmax classifier was trained on randomly sampled negative examples rather than on the subset of “hard negatives” used for SVM training.

Fast R-CNNAr. Xiv 15 • Training is single stage (cf. R-CNN) • Fine-tuning by Multi-task Loss – Bounding box Regression + Detection