EndtoEnd Text Recognition with Convolutional Neural Networks Tao
























- Slides: 24
End-to-End Text Recognition with Convolutional Neural Networks Tao Wang*, David J. Wu*, Adam Coates, Andrew Y. Ng Computer Science Department Stanford University * Denotes equal contribution Tao Wang
Scene Text Recognition Overview • Text “in the wild” are hard to recognize • Wide range of variations in backgrounds, textures, fonts, and lighting conditions ICDAR 2003 Dataset S. Lucas et al. , 2003 Street View Text Dataset K. Wang et al. , 2011 Tao Wang 2
Two-Stage Framework Detection/Classification High-level Inference “HOTEL” Tao Wang 3
Works Classification and detection High-level inference Weinman et al. , 2008 Appearance + Geometry Semi-Markov CRF K. Wang et al. , 2011 HOG + Random Ferns Pictorial Structure Mishra et al. , 2012 HOG + SVM with RBF Kernel CRF + N-gram model Neumann and Matas, 2012 MSER + SVM with RBF Kernel Exhaustive Graph Search Tao Wang 4
Classification and detection Most other approaches Our approach High-level inference Hand-designed features Graph based + off-the-shelf classifier inference models Learnt features + -layer CNN Simple 2 -the-shelf heuristics Tao Wang off 5
Various Benchmarks Detection/Classification ICDAR 62 -way cropped character classification End-to-end system after high-level inference ICDAR and SVT end-to-end text recognition SOTA ICDAR and SVT Cropped word recognition SOTA on ICDAR Lexicon SOTA Tao Wang 6
Unsupervised Feature Learning Contrast Normalization + ZCA whitening K-Means Coates et al. , 2011 Tao Wang 7
~10 K parameters for detection ~50 K parameters for classification √ Text × Non-Text Large representation but not enough data. Overfitting? 96 L 2 -SVM Classifier 256 Convolution Spatial Pooling 1 st layer Convolution Spatial Pooling 2 nd layer Backpropagation Tao Wang 8
Synthetic Data Real Data Unrealistic Synthetic Data Java. Font + Natural backgrounds Color Statistics Synthetic “hard negatives” Tao Wang 9
Detector Performance Tao Wang 10
Text Line Bounding boxes Candidate spaces Tao Wang 11
Classifier Performance 62 -way classification accuracy on ICDAR cropped characters Higher is better Accuracy(%) 83. 9 (on ICDAR-Sample characters) Tao Wang 12
Tao Wang 13
Char Class Sliding window position Tao Wang 14
Word Recognition Lexicon: SERIES max ∑ … -5. 45 MAKE SERIES 7. 82 ESTATE -1. 74 POKER -9. 02 …Tao Wang 15
Accuracy(%) Cropped Word Recognition Accuracy Higher is better Cropped Words Benchmarks Tao Wang 16
Candidate spaces generated by detector … … Tao Wang 17
Tao Wang 18
F-Score End-to-end text recognition results Higher is better End-to-end Benchmarks Tao Wang 19
Sample Output Images from SVT Tao Wang 20
Sample Output Images from ICDAR-FULL Tao Wang 21
c -- “confidence margin” Suggested Words POSE POST PEOPLE PISTOL … POST PEOSTEL LEXICON Our F-score: 0. 38 Hunspell Neumann and Matas, 2010: 0. 40 Tao Wang 22
Conclusion • • Learnt features + 2 -layer CNN for+ character detection and classification Simple heuristics to build end-to-end scene text recognition system State-of-the-art performances on - ICDAR cropped character classification - ICDAR cropped word recognition - Lexicon based end-to-end recognition on ICDAR and SVT Extensible to more general lexicon with off-the-shelf spelling checker Tao Wang 23
Tao Wang 24