Object Detection Creation from Scratch Samsung RD Institute

Object Detection Creation from Scratch Samsung R&D Institute Ukraine Vitaliy Bulygin 1

Problem formulation and dataset Udacity dataset near 22 000 images: 21000 – train 1000 – test Problem: find bounding boxes for cars 2

Naive solution: sliding window convolution layer Rectangles with different aspect ration and sizes max pooling layer fully connected layer Binary classifier is car? 3

Naive solution: sliding window Rectangles with different aspect ration and sizes Very slow! Binary classifier is car? 4

Several words about two-stage detectors Proposals Two-stage detectors First stage generates proposals Second stage is classifier Two-stage detectors is slower but accurate than the single-stage However difference in accuracy becomes smaller in 2018 5

Naive solution: location as output 6

Naive solution: location as output We do not know the object number! 7

Output in the view of the Grid Predict rectangle and class inside cell rectangle center coordinates rectangle width and height 8

Output in the view of the Grid( calculate it!) Git. Hub: data_generator. py -> convert_GT_to_YOLO(. . . ) 9

Output in the view of the Grid (papers) Predict rectangle and class inside cell Recent papers with the similar output: • • • RFB Net : Songtao Liu et al, 2018 Refine. Det : Shifeng Zhang et al, 2018 YOLOv 3: Joseph Redmonet al , 2018 Pelee Net: Robert J. Wang et al, 2018 FSSD: Zuo-Xin Li et al, 2018 DSOD: Zhiqiang Shen et al, 2018 . . . 10

Output in the view of the Grid (general case) Feature Extractor predict several boxes at the single case with aspect ration 1: 1, 2: 1, 1: 2, 3: 1, . . . 11

Output in the view of the Grid (general case) predict several boxes at the single case with aspect ration 1: 1, 2: 1, 1: 2, 3: 1, . . . 12

Output in the view of the Grid (general case) grid size is smaller 13

Output in the view of the Grid (general case) 14

Single stage object detector components data_preprocessing. py We have image dataset and GT rectangles What do we need to transform the data model input? data_generator. py 15

Single stage object detector components model. py 16

Single stage object detector components model. py 17

Single stage object detector components train. ipynb 18

Single stage object detector components data_postprocessing. py 19

Single stage object detector components evaluator. py precision recall 20

• • • • horizontal flip vertical flip zoom in-out width and height shift rotation at some range shear image brightness shift channel shift hue changing saturation changing contrast changing gamma correction histogram equalization 21

data_preprocessing. py only horizontal flip, width and height shift original gives more than 10% accuracy (m. AP) augmented 22

data_preprocessing. py augmented normalized 23

data_preprocessing. py images __getitem__() . . . GT labels . . . 24

data_preprocessing. py images __getitem__() . . . augment, normalize . . . labels . . . Grid output 25

model. py It is not an optimal feature extractor! 26

model. py It is not an optimal feature extractor! Why such architecture? Why 9 x 9 ? Encoded bounding boxes 27

Effective receptive field is the area of the original image that can possibly influence the activation of a neuron. conv 28

Effective receptive field is the area of the original image that can possibly influence the activation of a neuron. pool conv 29

Effective receptive field is the area of the input image that chosen feature looking on. pool conv 30

31

32

33 33

34 34

Receptive field has to contain the object with a margin Receptive field Cannot recognize car position 35 35

Very large receptive field Hard to localize small objects 36 36

model. py Feature extractor Head Output 37

model. py Feature extractor Possible improvement Head Output For smaller objects 38

model. ipynb, function YOLO_loss(y_true, y_pred) objects ‘object’ cell ‘no object’ cell 39

2. Use binary classification + cross-entropy loss function 40

2. Use binary classification + cross-entropy loss function 41

confidence relative to cell coordinates 42

confidence relative to cell coordinates Filtering? 43

confidence relative to cell coordinates scores rectangles 44

45

Compare IOU of the 1 st rectangle with others 46

Compare IOU of the 1 st rectangle with others 47

Compare IOU of the 1 st rectangle with others 48

Compare IOU of the 1 st rectangle with others 49

Compare IOU of the 1 st rectangle with others IOU = 0 with the chosen rectangle 50

Compare IOU of the 1 st rectangle with others 51

Compare IOU of the 2 nd rectangle with others 52

Compare IOU of the 2 nd rectangle with others 53

54

Correspondence between GT and predicted? 55

GT pred evaluator. py -> sort_ious(gt_boxes, pred_boxes, iou_thr) 56

GT pred if appear firstly matched GT matched pred evaluator. py -> get_single_image_results(gt_boxes, pred_boxes, iou_thr) 57

True predicted matched GT matched pred evaluator. py -> get_single_image_results(gt_boxes, pred_boxes, iou_thr) 58

get. True. Predicted What part of predicted is true What part of true predicted from all GT objects

get. True. Predicted • predicted = 4 • ground truth = 3 • true predicted = 1 • ground truth = 2 • true predicted = 1 What part of predicted is true What part of true predicted from all GT objects

get. True. Predicted • predicted = 4 • ground truth = 3 • true predicted = 1 • predicted = 1 • ground truth = 2 • true predicted = 1 evaluator. py -> calc_precision_recall(img_results)

sorted box confidence 62

sorted box confidence calc. Precision. Recall score prec recall high recall low precision For all images together! 63

sorted box confidence calc. Precision. Recall score prec recall For all images together! 64

sorted box confidence calc. Precision. Recall score prec recall High precision low recall For all images together! 65

sorted box confidence calc. Precision. Recall score prec recall High precision low recall get_thr_prec_rec(…) 66

hundreds of values for real data sets prec recall score 67

hundreds of values for real data sets prec recall score 11 recall thresholds 68

hundreds of values for real data sets prec recall score A lot of m. AP tutorials has misunderstanding : “area under curve” (AUC) is not the same! 11 recall thresholds 69

hundreds of values for real data sets prec recall score 70

hundreds of values for real data sets prec recall score 71

72