Deformable Part Models with CNN Features PierreAndr Savalle
Deformable Part Models with CNN Features Pierre-André Savalle, Stavros Tsogkas, George Papandreou, Iasonas Kokkinos From HOG to CNN features CNN-DPM HOG: gradient filters Fast multi-scale sliding window with patchworks gradient filters + pooling [Dubout and Fleuret, 2012; Iandola et al. , 2014] 8 8 S(x, y, s) H x W x 32 -D DPM pyramid Convolutional Neural Networks: learned filters + pooling convolutional layers 4096 -D fully connected convolutional layers Compare to RCNN detector [Girshick et al. , 2014] We can feed an arbitrary-size image to convolutional layers: CNN feature map [Sermanet et al. , 2014; Iandola et al. , 2014] region(i) Efficiency concerns Common Network: RCNN’s finetuned network Method C-DPM-BB DPMv 5 C-DPM-BB vs DPMv 5 RCNN 7 -BB RCNN 7 RCNN 5 C-DPM vs RCNN 5 Learning: LDA on whitened representation [Girshick and Malik, 2013] Learning + Testing: FFT-based convolution [Dubout and Fleuret, 2012] 256 Image CNN fts. FTCNN Image fts. 256 x Image CNN fts. + 256 Image Filter CNN fts. Image CNN fts. FT Filter CNN fts. 256 FFT 256 S(i) Detection performance of C-DPM Problem: 8 x larger features FFT convolutional + SVM fully connected 4096 -D warp H x W x 256 -D Image CNN fts. unstitch aero 39. 7 50. 9 33. 2 +17. 7 68. 1 64. 2 58. 2 -18. 5 bike 59. 5 64. 4 60. 3 +4. 1 72. 8 69. 7 63. 3 -3. 8 bird 35. 8 43. 4 10. 2 +33. 2 56. 8 50. 0 37. 9 -2. 1 boat 24. 8 29. 8 16. 1 +13. 7 43. 0 41. 9 27. 6 -2. 8 botl 35. 5 40. 3 27. 3 +13. 0 36. 8 32. 0 26. 1 +9. 4 bus 53. 7 56. 9 54. 3 +2. 6 66. 3 62. 6 54. 1 -0. 4 Observations: HOG-DPM<< C-DPM ✔ FT Det. scores Detection scores FFT-1 car 48. 6 58. 2 +0. 4 74. 2 71. 0 66. 9 -18. 3 cat 46. 0 46. 3 23. 0 +23. 3 67. 6 60. 7 51. 4 -5. 4 chair 29. 2 33. 3 20. 0 +13. 3 34. 4 32. 7 26. 7 +2. 5 cow 36. 8 40. 5 24. 1 +16. 4 63. 5 58. 5 55. 5 -18. 7 dtbl 45. 5 47. 3 26. 7 +20. 6 54. 5 46. 5 43. 4 +2. 1 C-DPM<<RCNN-7 ✔ dog 42. 0 43. 4 12. 7 +30. 7 61. 2 56. 1 43. 1 -1. 1 hors mbk 57. 7 56. 0 65. 2 60. 5 58. 1 48. 2 +7. 1 +12. 3 69. 1 68. 6 60. 6 66. 8 57. 7 59. 0 0. 0 -3. 0 pers 37. 4 42. 2 43. 2 -1. 0 58. 7 54. 2 45. 8 -8. 4 plant 30. 1 31. 4 12. 0 +19. 4 33. 4 31. 5 28. 1 +2. 0 sheep 31. 1 35. 2 21. 1 +14. 1 62. 9 52. 8 50. 8 -19. 7 sofa 50. 4 54. 5 36. 1 +18. 4 51. 1 48. 9 40. 6 +9. 8 train 56. 1 61. 6 46. 0 +15. 6 62. 5 57. 9 53. 1 +3. 0 tv 51. 6 58. 6 43. 5 +15. 1 64. 8 64. 7 56. 4 -4. 8 m. AP 43. 4 48. 2 33. 7 +14. 5 58. 5 54. 2 47. 3 -3. 9 C-DPM<=RCNN-5 ? !? ! Conjecture: C-DPM<RCNN-5 is due to aspect ratio clustering Way ahead: higher-level CNN features, integrate into pose estimation, deal with aspect ratio Dubout, C. , Fleuret, F. : Exact acceleration of linear object detectors (ECCV 2012) Girshick, R. , Donahue, J. , Darrell, T. , Malik, J. : Rich feature hierarchies for accurate object detection and semantic segmentation (CVPR 2014) Girshick, R. , Malik, J. : Training Deformable Part Models with Decorrelated features (ICCV 2013) Sermanet, P. , Eigen, D. , Zhang, X. , Mathieu, M. , Fergus, R. , Le. Cun, Y. : Overfeat: Integrated recognition, localization and detection using convolutional networks (ICLR 2014) Iandola, F. , Moskewicz, M. , Karayev, S. , Girshick, R. , Darrell, T. , Keutzer, K. : Densenet: Implementing efficient convnet descriptor pyramids. (ar. Xiv: 1404. 1869 2014) References
- Slides: 1