OffLine Cursive Word Recognition Tal Steinherz TelAviv University

  • Slides: 49
Download presentation
(Off-Line) Cursive Word Recognition Tal Steinherz Tel-Aviv University

(Off-Line) Cursive Word Recognition Tal Steinherz Tel-Aviv University

Cursive Word Recognition Preprocessing Segmentation Feature Extraction Recognition Post Processing 2

Cursive Word Recognition Preprocessing Segmentation Feature Extraction Recognition Post Processing 2

Preprocessing l Skew correction l Slant correction l Smoothing l Reference line finding 3

Preprocessing l Skew correction l Slant correction l Smoothing l Reference line finding 3

4

4

Segmentation Motivation Given a 2 -dimensional image and a model that expects a 1

Segmentation Motivation Given a 2 -dimensional image and a model that expects a 1 -dimensional input signal, one needs to derive an ordered list of features. l Fragmentation is another alternative where the resulting pieces have no literal meaning. l 5

Segmentation Dilemma To segment or not to segment ? That’s the question! l Sayre’s

Segmentation Dilemma To segment or not to segment ? That’s the question! l Sayre’s paradox: “To recognize a letter, one must know where it starts and where it ends, to isolate a letter, one must recognize it first”. l 6

Recognition Model l What is the basic (atomic) model? – word (remains identical through

Recognition Model l What is the basic (atomic) model? – word (remains identical through training and recognition) – letter (concatenated on demand during recognition) l What are the training implications? – specific = total cover (several samples for each word) – dynamic = brick cover (samples of various words that include all possible 7 characters=letters)

Basic Word Model 1 st letter submodel . . . i th letter sub

Basic Word Model 1 st letter submodel . . . i th letter sub -model . . . last letter sub -model 8

Segmentation-Free l In a segmentation-free approach recognition is based on measuring the distance between

Segmentation-Free l In a segmentation-free approach recognition is based on measuring the distance between observation sequences 9

Segmentation-Free - continue l The most popular metric is Levenshtein’s Edit Distance, where a

Segmentation-Free - continue l The most popular metric is Levenshtein’s Edit Distance, where a transformation between sequences is done by atomic operations: insertion, deletion and substitution associated with different costs l Implementations: Dynamic programming, HMM 10

Segmentation-Free (demo) Each column was translated into a feature vector. l Two types of

Segmentation-Free (demo) Each column was translated into a feature vector. l Two types of features: – number of zero-crossing – gradient of the word’s curve l 11

12

12

1) The gradient of the word’s curve at a given pixel column 13

1) The gradient of the word’s curve at a given pixel column 13

Letter sub-HMM components Normal Transition Null Transition 14

Letter sub-HMM components Normal Transition Null Transition 14

Letter sub-HMM Normal Transition Null Transition 15

Letter sub-HMM Normal Transition Null Transition 15

Segmentation-Based l In a segmentation-based approach recognition is based on complete bipartite match-making between

Segmentation-Based l In a segmentation-based approach recognition is based on complete bipartite match-making between blocks of primitive segments and letters of a word 16

Segmentation-Based - continue l The best match is found by the dynamic programming Viterbi

Segmentation-Based - continue l The best match is found by the dynamic programming Viterbi algorithm l An implementation by an HMM is very popular and enhances the model capabilities 17

Segmentation-Based (demo) First the word is heuristically segmented. l It is preferable to over

Segmentation-Based (demo) First the word is heuristically segmented. l It is preferable to over segment a character. Nevertheless a character must not span more than a predefined number of segments. l Each segment is translated into a feature vector. l 18

Features in Segments (demo) l Global features: – ascenders, descenders, loops, i dots, t

Features in Segments (demo) l Global features: – ascenders, descenders, loops, i dots, t strokes l Local features: – X crossings, T crossings, end points, sharp curvatures, parametric strokes l Non-symbolic features: – pixel moments, pixel distributions, contour condings 19

20

20

21

21

Letter sub-HMM (maximum 4 segments per character) 1 2 3 4 1 22

Letter sub-HMM (maximum 4 segments per character) 1 2 3 4 1 22

Two-Letter joined sub-HMM (0. 5 -3 segments per character) L M R L 23

Two-Letter joined sub-HMM (0. 5 -3 segments per character) L M R L 23

Pattern Recognition Issues l Lexicon size: – small (up to 100 words) – limited

Pattern Recognition Issues l Lexicon size: – small (up to 100 words) – limited (between 100 to 1000 words) – infinite (more than 1000 words) 24

Word Model Extension l A new approach to practice recognition? – path discriminant (a

Word Model Extension l A new approach to practice recognition? – path discriminant (a single general word model, a path=hypothesis per word) ‘a’ sub -HMM . . . ‘m’ sub. HMM . . . ‘z’ sub -HMM 25

Online vs. Off-Line Online – captured by pen-like devices. the input format is a

Online vs. Off-Line Online – captured by pen-like devices. the input format is a two-dimensional signal of pixel locations as a function of time (x(t), y(t)). l Off-line – captured by scanning devices. the input format is a two-dimensional image of gray-scale colors as a function of location I(m*n). strokes have significant width. l 26

Online vs. Off-Line (demo) 27

Online vs. Off-Line (demo) 27

Online vs. Off-Line (cont. ) l In general online classifiers are superior to off-line

Online vs. Off-Line (cont. ) l In general online classifiers are superior to off-line classifiers because some valuable strokes are blurred in the static image. Sometimes temporal information (stroke order) is also a must in order to distinguish between similar objects. 28

Online Weaknesses Sensitivity to stroke order, stroke number and stroke characteristics variations: l Similar

Online Weaknesses Sensitivity to stroke order, stroke number and stroke characteristics variations: l Similar shapes that resemble in the image domain might be produced by different sets of strokes. l Many redundant strokes (consecutive superfluous pixels) that are byproducts of the continuous nature of cursive handwriting. l Incomplete (open) loops are more frequent. 29

30

30

Off-Line can improve Online Sometimes the off-line representation enables one to recognize words that

Off-Line can improve Online Sometimes the off-line representation enables one to recognize words that are not recognized given the online signal. Ø An optimal system would combine online and off-line based classifiers. l 31

The desired integration between online and off-line classifiers Having a single word recognition engine

The desired integration between online and off-line classifiers Having a single word recognition engine to practice both the online and off-line data. Ø It requires an off-line to online transformation to extract an alternative list of strokes that preserves off-line like features while being consistent in order. l 32

Online signal Projection to image Domain Bitmap image Stroke width=1 Online signal “Painting” (thickening

Online signal Projection to image Domain Bitmap image Stroke width=1 Online signal “Painting” (thickening the strokes) Real static image Stroke width>1 The “pseudo-online” transformation Pseudo-online representation Online recognition engine C l a s s i f i c a t i o n Online classifiers Pseudo-online classifiers Online classification outputs Pseudo-online classification outputs Integration by some combination scheme Recognition results 33

Cursive Handwriting Terms Axis - The main subset of strokes that assemble the backbone,

Cursive Handwriting Terms Axis - The main subset of strokes that assemble the backbone, which is the shortest path from left to right including loops on several occasions. l Tarsi - The other subsets of connected strokes that produce branches, which are hang above (in case of ascenders) or below (in case of descenders) the axis. l 34

The Pseudo-Online Transformation Follow the skeleton of the axis from the left most pixel

The Pseudo-Online Transformation Follow the skeleton of the axis from the left most pixel until reaching the first intersection with a tarsus. l Surround the tarsus by tracking its contour until returning back to the intersection point we started from. l Continue along the axis to the next intersection with a tarsus, and so on until the right most pixel is reached. l Loops that are encountered along the axis are also surrounded completely. l 35

Computing the axis’s skeleton 36

Computing the axis’s skeleton 36

Computing the axis’s skeleton (cont. ) 37

Computing the axis’s skeleton (cont. ) 37

Computing the axis’s skeleton (cont. ) 38

Computing the axis’s skeleton (cont. ) 38

Processing the tarsi 39

Processing the tarsi 39

Processing the tarsi (cont. ) 40

Processing the tarsi (cont. ) 40

Handling i-dots 41

Handling i-dots 41

42

42

Experimental Setup The online word recognition engine of Neskovic et al. – satisfies Trainability

Experimental Setup The online word recognition engine of Neskovic et al. – satisfies Trainability and Versatility. l A combination of 6/12 online and pseudoonline classifiers. l Several combination schemes – majority vote, max rule, sum rule. l An extension of the HP’s dataset that can be found in the UNIPEN collection. l 43

Experimental Setup (cont. ) Different Training sets of 46 writers. l Disjoint validation sets

Experimental Setup (cont. ) Different Training sets of 46 writers. l Disjoint validation sets of 9 writers. l Disjoint test set of 11 writers. l The lexicon contains 862 words. l 44

Experimental Results for 6 Classifiers 45

Experimental Results for 6 Classifiers 45

Experimental Results for 12 Classifiers 46

Experimental Results for 12 Classifiers 46

Result Analysis Word level - in 110 word classes (12. 8%) at least 7

Result Analysis Word level - in 110 word classes (12. 8%) at least 7 word samples (10. 6%) were correctly recognized only by the combination with the pseudo-online classifiers. l Writer level – for 12 writers (18. 2%) at least 65 of the words they produced (7. 5%) were correctly recognized only by the combination with the pseudo-online classifiers. l 47

Result Analysis (cont. ) 909 of the input words (5. 9%) were correctly recognized

Result Analysis (cont. ) 909 of the input words (5. 9%) were correctly recognized by at least one pseudo-online classifier and neither one of the 12 online classifiers. l 357 of the input words (2. 3%) were correctly recognized by at least 4 of the 12 pseudo-online classifiers and neither one of the 12 online classifiers. l For 828 of the input words (5. 3%) the difference between the number of pseudo-online and online classifiers that correctly recognized them was 6 or more. l 48

Conclusions l The pseudo-online representation does add information that cannot be obtained by optimizing

Conclusions l The pseudo-online representation does add information that cannot be obtained by optimizing extending a combination of online classifiers only. 49