Object Detection Creation from Scratch Samsung RD Institute
Object Detection Creation from Scratch Samsung R&D Institute Ukraine Vitaliy Bulygin 1
Problem formulation and dataset Udacity dataset near 22 000 images: 21000 – train 1000 – test Problem: find bounding boxes for cars 2
Naive solution: sliding window convolution layer Rectangles with different aspect ration and sizes max pooling layer fully connected layer Binary classifier is car? 3
Naive solution: sliding window Rectangles with different aspect ration and sizes Very slow! Binary classifier is car? 4
Several words about two-stage detectors Proposals Two-stage detectors First stage generates proposals Second stage is classifier Two-stage detectors is slower but accurate than the single-stage However difference in accuracy becomes smaller in 2018 5
Naive solution: location as output 6
Naive solution: location as output We do not know the object number! 7
Output in the view of the Grid Predict rectangle and class inside cell rectangle center coordinates rectangle width and height 8
Output in the view of the Grid( calculate it!) Git. Hub: data_generator. py -> convert_GT_to_YOLO(. . . ) 9
Output in the view of the Grid (papers) Predict rectangle and class inside cell Recent papers with the similar output: • • • RFB Net : Songtao Liu et al, 2018 Refine. Det : Shifeng Zhang et al, 2018 YOLOv 3: Joseph Redmonet al , 2018 Pelee Net: Robert J. Wang et al, 2018 FSSD: Zuo-Xin Li et al, 2018 DSOD: Zhiqiang Shen et al, 2018 . . . 10
Output in the view of the Grid (general case) Feature Extractor predict several boxes at the single case with aspect ration 1: 1, 2: 1, 1: 2, 3: 1, . . . 11
Output in the view of the Grid (general case) predict several boxes at the single case with aspect ration 1: 1, 2: 1, 1: 2, 3: 1, . . . 12
Output in the view of the Grid (general case) grid size is smaller 13
Output in the view of the Grid (general case) 14
Single stage object detector components data_preprocessing. py We have image dataset and GT rectangles What do we need to transform the data model input? data_generator. py 15
Single stage object detector components model. py 16
Single stage object detector components model. py 17
Single stage object detector components train. ipynb 18
Single stage object detector components data_postprocessing. py 19
Single stage object detector components evaluator. py precision recall 20
• • • • horizontal flip vertical flip zoom in-out width and height shift rotation at some range shear image brightness shift channel shift hue changing saturation changing contrast changing gamma correction histogram equalization 21
data_preprocessing. py only horizontal flip, width and height shift original gives more than 10% accuracy (m. AP) augmented 22
data_preprocessing. py augmented normalized 23
data_preprocessing. py images __getitem__() . . . GT labels . . . 24
data_preprocessing. py images __getitem__() . . . augment, normalize . . . labels . . . Grid output 25
model. py It is not an optimal feature extractor! 26
model. py It is not an optimal feature extractor! Why such architecture? Why 9 x 9 ? Encoded bounding boxes 27
Effective receptive field is the area of the original image that can possibly influence the activation of a neuron. conv 28
Effective receptive field is the area of the original image that can possibly influence the activation of a neuron. pool conv 29
Effective receptive field is the area of the input image that chosen feature looking on. pool conv 30
31
32
33 33
34 34
Receptive field has to contain the object with a margin Receptive field Cannot recognize car position 35 35
Very large receptive field Hard to localize small objects 36 36
model. py Feature extractor Head Output 37
model. py Feature extractor Possible improvement Head Output For smaller objects 38
model. ipynb, function YOLO_loss(y_true, y_pred) objects ‘object’ cell ‘no object’ cell 39
2. Use binary classification + cross-entropy loss function 40
2. Use binary classification + cross-entropy loss function 41
confidence relative to cell coordinates 42
confidence relative to cell coordinates Filtering? 43
confidence relative to cell coordinates scores rectangles 44
45
Compare IOU of the 1 st rectangle with others 46
Compare IOU of the 1 st rectangle with others 47
Compare IOU of the 1 st rectangle with others 48
Compare IOU of the 1 st rectangle with others 49
Compare IOU of the 1 st rectangle with others IOU = 0 with the chosen rectangle 50
Compare IOU of the 1 st rectangle with others 51
Compare IOU of the 2 nd rectangle with others 52
Compare IOU of the 2 nd rectangle with others 53
54
Correspondence between GT and predicted? 55
GT pred evaluator. py -> sort_ious(gt_boxes, pred_boxes, iou_thr) 56
GT pred if appear firstly matched GT matched pred evaluator. py -> get_single_image_results(gt_boxes, pred_boxes, iou_thr) 57
True predicted matched GT matched pred evaluator. py -> get_single_image_results(gt_boxes, pred_boxes, iou_thr) 58
get. True. Predicted What part of predicted is true What part of true predicted from all GT objects
get. True. Predicted • predicted = 4 • ground truth = 3 • true predicted = 1 • ground truth = 2 • true predicted = 1 What part of predicted is true What part of true predicted from all GT objects
get. True. Predicted • predicted = 4 • ground truth = 3 • true predicted = 1 • predicted = 1 • ground truth = 2 • true predicted = 1 evaluator. py -> calc_precision_recall(img_results)
sorted box confidence 62
sorted box confidence calc. Precision. Recall score prec recall high recall low precision For all images together! 63
sorted box confidence calc. Precision. Recall score prec recall For all images together! 64
sorted box confidence calc. Precision. Recall score prec recall High precision low recall For all images together! 65
sorted box confidence calc. Precision. Recall score prec recall High precision low recall get_thr_prec_rec(…) 66
hundreds of values for real data sets prec recall score 67
hundreds of values for real data sets prec recall score 11 recall thresholds 68
hundreds of values for real data sets prec recall score A lot of m. AP tutorials has misunderstanding : “area under curve” (AUC) is not the same! 11 recall thresholds 69
hundreds of values for real data sets prec recall score 70
hundreds of values for real data sets prec recall score 71
72
- Slides: 72