EndtoEnd Text Recognition with Convolutional Neural Networks Tao

Scene Text Recognition Overview • Text “in the wild” are hard to recognize •

Two-Stage Framework Detection/Classification High-level Inference “HOTEL” Tao Wang 3

Works Classification and detection High-level inference Weinman et al. , 2008 Appearance + Geometry

Classification and detection Most other approaches Our approach High-level inference Hand-designed features Graph based

Various Benchmarks Detection/Classification ICDAR 62 -way cropped character classification End-to-end system after high-level inference

Unsupervised Feature Learning Contrast Normalization + ZCA whitening K-Means Coates et al. , 2011

~10 K parameters for detection ~50 K parameters for classification √ Text × Non-Text

Synthetic Data Real Data Unrealistic Synthetic Data Java. Font + Natural backgrounds Color Statistics

Text Line Bounding boxes Candidate spaces Tao Wang 11

Classifier Performance 62 -way classification accuracy on ICDAR cropped characters Higher is better Accuracy(%)

Char Class Sliding window position Tao Wang 14

Word Recognition Lexicon: SERIES max ∑ … -5. 45 MAKE SERIES 7. 82 ESTATE

Accuracy(%) Cropped Word Recognition Accuracy Higher is better Cropped Words Benchmarks Tao Wang 16

Candidate spaces generated by detector … … Tao Wang 17

F-Score End-to-end text recognition results Higher is better End-to-end Benchmarks Tao Wang 19

Sample Output Images from SVT Tao Wang 20

Sample Output Images from ICDAR-FULL Tao Wang 21

c -- “confidence margin” Suggested Words POSE POST PEOPLE PISTOL … POST PEOSTEL LEXICON

Conclusion • • Learnt features + 2 -layer CNN for+ character detection and classification

Slides: 24

Download presentation

End-to-End Text Recognition with Convolutional Neural Networks Tao Wang*, David J. Wu*, Adam Coates, Andrew Y. Ng Computer Science Department Stanford University * Denotes equal contribution Tao Wang

Scene Text Recognition Overview • Text “in the wild” are hard to recognize • Wide range of variations in backgrounds, textures, fonts, and lighting conditions ICDAR 2003 Dataset S. Lucas et al. , 2003 Street View Text Dataset K. Wang et al. , 2011 Tao Wang 2

Two-Stage Framework Detection/Classification High-level Inference “HOTEL” Tao Wang 3

Works Classification and detection High-level inference Weinman et al. , 2008 Appearance + Geometry Semi-Markov CRF K. Wang et al. , 2011 HOG + Random Ferns Pictorial Structure Mishra et al. , 2012 HOG + SVM with RBF Kernel CRF + N-gram model Neumann and Matas, 2012 MSER + SVM with RBF Kernel Exhaustive Graph Search Tao Wang 4

Classification and detection Most other approaches Our approach High-level inference Hand-designed features Graph based + off-the-shelf classifier inference models Learnt features + -layer CNN Simple 2 -the-shelf heuristics Tao Wang off 5

Various Benchmarks Detection/Classification ICDAR 62 -way cropped character classification End-to-end system after high-level inference ICDAR and SVT end-to-end text recognition SOTA ICDAR and SVT Cropped word recognition SOTA on ICDAR Lexicon SOTA Tao Wang 6

Unsupervised Feature Learning Contrast Normalization + ZCA whitening K-Means Coates et al. , 2011 Tao Wang 7

~10 K parameters for detection ~50 K parameters for classification √ Text × Non-Text Large representation but not enough data. Overfitting? 96 L 2 -SVM Classifier 256 Convolution Spatial Pooling 1 st layer Convolution Spatial Pooling 2 nd layer Backpropagation Tao Wang 8

Synthetic Data Real Data Unrealistic Synthetic Data Java. Font + Natural backgrounds Color Statistics Synthetic “hard negatives” Tao Wang 9

Detector Performance Tao Wang 10

Text Line Bounding boxes Candidate spaces Tao Wang 11

Classifier Performance 62 -way classification accuracy on ICDAR cropped characters Higher is better Accuracy(%) 83. 9 (on ICDAR-Sample characters) Tao Wang 12

Tao Wang 13

Char Class Sliding window position Tao Wang 14

Word Recognition Lexicon: SERIES max ∑ … -5. 45 MAKE SERIES 7. 82 ESTATE -1. 74 POKER -9. 02 …Tao Wang 15

Accuracy(%) Cropped Word Recognition Accuracy Higher is better Cropped Words Benchmarks Tao Wang 16

Candidate spaces generated by detector … … Tao Wang 17

Tao Wang 18

F-Score End-to-end text recognition results Higher is better End-to-end Benchmarks Tao Wang 19

Sample Output Images from SVT Tao Wang 20

Sample Output Images from ICDAR-FULL Tao Wang 21

c -- “confidence margin” Suggested Words POSE POST PEOPLE PISTOL … POST PEOSTEL LEXICON Our F-score: 0. 38 Hunspell Neumann and Matas, 2010: 0. 40 Tao Wang 22

Conclusion • • Learnt features + 2 -layer CNN for+ character detection and classification Simple heuristics to build end-to-end scene text recognition system State-of-the-art performances on - ICDAR cropped character classification - ICDAR cropped word recognition - Lexicon based end-to-end recognition on ICDAR and SVT Extensible to more general lexicon with off-the-shelf spelling checker Tao Wang 23

Tao Wang 24