2018 SchoolSouthwest Jiaotong University TeamAiyayaya SpeakerJiaxin Ren CONTENT

百度-西交大·大数据竞赛 2018 商家招牌的分类与检测 School：Southwest Jiaotong University Team：Aiyayaya Speaker：Jiaxin Ren

CONTENT 1 Task overview 2 Solution 3 Modification and Innovations 4 Practicability and Extensions

Team introduction Jiaxin Ren l Grade 2 master from Southwest Jiaotong University l State-Province

Task overview a. Multiple categories b. Diversification of each category Categories Dataset Task Round

Challenges Easy examples Hard examples One trademark Fuzzy Duplication Reflection High contrast Denseness Occlusion

8 Solution Input Raw data Preprocessing Backbone and Model Data cleaning Resnet[1] Statistics Res.

9 Augmentation Contrast Easy examples Always Horizontal Flip Always Hard examples Contrast Gaussian Blur

0 senda haagen_dazs dhc zara xiabu anta vivo anta_kids columbia bosideng tmj mido tebu

Network Baseline To achieve a great balance between accuracy and efficiency, we choose Faster-RCNN

Modification and Innovations Part 3 第 12 页竢实扬华，自强不息

Modification and Innovations Modification: 1. Small anchors for small target detection 2. Different RPN

Modification and Innovations 3. Soft NMS 4. Training and Testing uses multi-scale jitter over

Modification and Innovations: 1. Adaptive RPN Positive and Negative 2. Sample balance strategy Different

Modification and Innovations 3. Training Validation set 4. Multi-Process augmentations Multi-Process Training valset Org

Modification and Innovations 5. Edit Loss function 6. Limit one picture to only one

Practicability and Extensions Part 4 第 18 页竢实扬华，自强不息

Practicability and Extensions Training and Testing on TITAN X(Pascal) Round score rank Round 1

Practicability and Extensions Score 0. 0001 0. 0015 0. 005 0. 1 0. 2

Conclusion Advantages Ø No model fusion, May be the only single model in this

Innovations: ü Adaptive RPN Positive and Negative ü Sample balance strategy ü Training strategy(all

References Ø [1] Deep Residual Learning for Image Recognition. Kaiming He, Xiangyu Zhang, Shaoqing

Slides: 26

Download presentation

百度-西交大·大数据竞赛 2018 商家招牌的分类与检测 School：Southwest Jiaotong University Team：Aiyayaya Speaker：Jiaxin Ren

CONTENT 1 Task overview 2 Solution 3 Modification and Innovations 4 Practicability and Extensions 5 Conclusion 6 References 第 2页竢实扬华，自强不息

Team introduction Jiaxin Ren l Grade 2 master from Southwest Jiaotong University l State-Province Joint Engineer Laboratory in Spatial Information technology for High-Speed Railway l Photogrammetry and Remote sensing image processing l Baidu Classification and detection of trademark(independent participation) Preliminary： 3/1139 Quarter-final： 7/1139 l Data Castle Accurate identification of traffic jammer vehicles(independent participation) Preliminary： 7/823 第 3页竢实扬华，自强不息

Task overview a. Multiple categories b. Diversification of each category Categories Dataset Task Round 1 100 Train: 2725 Test: 1000 Classification Round 2 60 Train: 9000 Test: 4351 Detection 第 5页竢实扬华，自强不息

Challenges Easy examples Hard examples One trademark Fuzzy Duplication Reflection High contrast Denseness Occlusion Different 第 6页竢实扬华，自强不息

8 Solution Input Raw data Preprocessing Backbone and Model Data cleaning Resnet[1] Statistics Res. Ne. Xt[2] Sample balance Retina. Net[3] Augmentation Faster-RCNN[4] Format conversion Mask-RCNN[5] …… …… [1] Deep Residual Learning for Image Recognition. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. [2] Aggregated Residual Transformations for Deep Neural Networks. Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [3] Focal Loss for Dense Object Detection. Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. IEEE International Conference on Computer Vision (ICCV), 2017. [4] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Conference on Neural Information Processing Systems (NIPS), 2015. [5] Mask R-CNN. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. IEEE International Conference on Computer Vision (ICCV), 2017. Output Result 第 8页竢实扬华，自强不息

9 Augmentation Contrast Easy examples Always Horizontal Flip Always Hard examples Contrast Gaussian Blur HSV transform 1 of 4 Salt and pepper Contrast HSV transform 1 of 2 Gaussian Blur Raw data Horizontal Flip HSV transform Salt and pepper 第 9页竢实扬华，自强不息

0 senda haagen_dazs dhc zara xiabu anta vivo anta_kids columbia bosideng tmj mido tebu zmn samsung jwyb lyf baleno happy_lemon burger_king oppo ajidou zhouheiya xyx hla vans st_sat uniqlo puma bsk camel coco gujin wqlm hm gong_cha pierre_cardin calvin_klein huawei innisfree maybelline converse la_chapelle new_balance li_ning peacebird playboy youngor chando herborist jack_jones selected xbk mdl belle vero_moda watsons kfc nike adidas 1 0 Sample balance MAP and Count MAP 100 MAP_Mean 90 Count 80 Count_Mean 70 50 30 20 Two ways: A. Upsampling according to MAP_Mean(56. 8) B. Upsampling according to Count_Mean(224) 竢实扬华，自强不息第 10 页 600 500 400 60 300 40 200 10 100 0

Network Baseline To achieve a great balance between accuracy and efficiency, we choose Faster-RCNN with Res. Ne. Xt 101 -32 x 8 d AP 50 0, 84 0, 83 0, 82 0, 81 Resnet 50 0, 825 0, 821 Resnet 101 Res. Next 101 -64 x 4 d 0, 83 0, 825 Res. Next 101 -32 x 8 d X-152 -32 x 8 d-IN 5 k 0, 8 0, 791 0, 79 AP 50 0, 835 0, 829 0, 784 0, 83 Faster-RCNN Mask-RCNN 0, 82 0, 815 0, 81 0, 78 0, 805 0, 77 0, 8 0, 76 0, 795 Different Backbone with Faster-RCNN Retina. Net 0, 829 0, 807 Model with Res. Ne. Xt 101 -32 x 8 d 第 11 页竢实扬华，自强不息

Modification and Innovations Part 3 第 12 页竢实扬华，自强不息

Modification and Innovations Modification: 1. Small anchors for small target detection 2. Different RPN anchor aspect ratios DIFFERENT ANCHORS Org DIFFERENT RATIOS Ours 0, 836 0, 835 Org 0, 842 0, 832 0, 84 0, 829 0, 838 0, 836 0, 828 0, 843 0, 844 0, 83 Ours 0, 835 0, 834 0, 826 0, 832 0, 824 MAP 0, 83 MAP 第 13 页竢实扬华，自强不息

Modification and Innovations 3. Soft NMS 4. Training and Testing uses multi-scale jitter over scales DIFFERENT ANCHORS Org Ours MULTI-SCALE Org Ours 0, 863 0, 864 0, 851 0, 852 0, 859 0, 848 0, 846 0, 844 0, 851 0, 843 0, 842 0, 849 0, 84 0, 838 MAP 0, 844 MAP 第 14 页竢实扬华，自强不息

Modification and Innovations: 1. Adaptive RPN Positive and Negative 2. Sample balance strategy Different RPN ratios Sample balance 0, 873 0, 868 0, 867 0, 866 0, 865 0, 864 0, 863 0, 862 0, 861 0, 86 0, 872 0, 871 0, 862 0, 863 0, 87 0, 863 0, 869 0. 9+0. 1 0. 8+0. 2 Org 0. 7+0. 3 Ours 0. 8+0. 2 0, 868 0. 7+0. 3 0. 9+0. 1 0, 868 0, 867 Ours 0, 866 Org MAP Org Ours 第 15 页竢实扬华，自强不息

Modification and Innovations 3. Training Validation set 4. Multi-Process augmentations Multi-Process Training valset Org Ours 0, 885 0, 886 Org Ours 310 350 300 250 0, 881 200 150 0, 876 0, 873 100 50 0, 871 MAP 16 0 Speed(imgs/s) 第 16 页竢实扬华，自强不息

Modification and Innovations 5. Edit Loss function 6. Limit one picture to only one category Edit Loss function Limit categories 0, 895 0, 8949 Limit Bboxes 0, 895 0, 894 0, 884 0, 9 0, 89 7. Limit the number of Bboxes 0, 885 0, 88 0, 87 0, 86 Ours 0, 85 MAP_Mean Count_Mean Org Ours MAP Org Ours 第 17 页竢实扬华，自强不息

Practicability and Extensions Part 4 第 18 页竢实扬华，自强不息

Practicability and Extensions Training and Testing on TITAN X(Pascal) Round score rank Round 1 0. 999 3 rd Round 2 0. 895 7 th Model Score Size Time / 4351 imgs Faster-RCNN(X-10132 x 8 d) 0. 8756 844 M 1024 s(4. 25 imgs/s) Faster-RCNN(X-10164 x 4 d) 0. 8747 820 M 1017 s(4. 28 imgs/s) Faster-RCNN(X-10132 x 8 d+) 0. 8950 844 M 13925 s(0. 31 imgs/s) Faster-RCNN(X-10164 x 4 d+) 0. 8941 820 M 13047 s(0. 33 imgs/s) +: Test-time augmentations 第 19 页竢实扬华，自强不息

Practicability and Extensions Score 0. 0001 0. 0015 0. 005 0. 1 0. 2 AP 0. 5963 0. 5965 0. 5969 0. 5962 0. 5949 When iters=179999 has the highest AP 50 and less time, choose iters=179999 AP 50 0. 9095 0. 9094 0. 9075 0. 9050 0. 9015 Iters 119999 139999 159999 179999 199999 219999 AP 75 0. 6863 0. 6864 0. 6869 0. 6887 0. 6890 0. 6869 AP 0. 5770 0. 5841 0. 5859 0. 5969 0. 5979 0. 5966 APs 0. 2259 0. 2260 0. 2262 0. 2160 0. 2099 0. 1861 AP 50 0. 8961 0. 8971 0. 8939 0. 9075 0. 9056 0. 9059 APm 0. 4013 0. 4006 0. 3996 0. 3969 APl 0. 6414 0. 6416 0. 6419 0. 6426 0. 6419 0. 6401 AP 75 0. 6581 0. 6528 0. 6575 0. 6887 0. 6847 0. 6831 APs 0. 1792 0. 1741 0. 1537 0. 2160 0. 2216 0. 2210 0. 0015 has the highest AP 50 and less boxes, choose score=0. 001 APm 0. 4265 0. 4263 0. 4397 0. 4006 0. 4018 0. 4001 APl 0. 6261 0. 6231 0. 6259 0. 6426 0. 6429 0. 6415 第 20 页竢实扬华，自强不息

Conclusion Advantages Ø No model fusion, May be the only single model in this competition. Ø No more complex models are used(like Res. Ne. Xt-152), fast and accurate. Ø Our method can be applied to other CV task, such as classification, segmentation, and so on. Ø No complicated preprocessing, no weird tricks. Reasonable, effective in many other areas. Modification: l Small anchors for small target detection. l Different RPN anchor aspect ratios. l NMS to Soft NMS. l Training and Testing uses multi-scale jitter over scales. 第 22 页竢实扬华，自强不息

Innovations: ü Adaptive RPN Positive and Negative ü Sample balance strategy ü Training strategy(all data) ü Multi-process augmentations ü Custom Loss function ü Limit one picture to only one category ü Limit the number of Bboxes To be improved and further research: Ø Improve image resolution Ø Use a more powerful model Ø how to use feature map more efficiently? 第 23 页竢实扬华，自强不息

References Ø [1] Deep Residual Learning for Image Recognition. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. Ø [2] Aggregated Residual Transformations for Deep Neural Networks. Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. Ø [3] Focal Loss for Dense Object Detection. Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. IEEE International Conference on Computer Vision (ICCV), 2017. 标题文本设 Ø [4] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Conference on Neural Information Processing Systems (NIPS), 2015. Ø [5] Mask R-CNN. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. IEEE International Conference on Computer Vision (ICCV), 2017. Ø [6] Feature Pyramid Networks for Object Detection. Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 第 25 页竢实扬华，自强不息

THANK YOU !