Twodimensional ContextFree Grammars Mathematical Formulae Recognition Daniel Pra

  • Slides: 26
Download presentation
Two-dimensional Context-Free Grammars: Mathematical Formulae Recognition Daniel Průša, Václav Hlaváč Center for Machine Perception

Two-dimensional Context-Free Grammars: Mathematical Formulae Recognition Daniel Průša, Václav Hlaváč Center for Machine Perception Faculty of Electrical Engineering Czech Technical University, Prague 1

Presentation Overview 2 § § § § Formulae recognition, problem formulation Known methods General

Presentation Overview 2 § § § § Formulae recognition, problem formulation Known methods General idea of structural recognition Two-dimensional context-free grammars Extension of the grammars Recognition tool, pilot implementation Results, future plans

Motivation for this work 3 § To test a theoretical construct on a practical

Motivation for this work 3 § To test a theoretical construct on a practical pilot problem with explicit structure mathematical formulae § The group of Schlesinger, Savchynskyy from Kiev works on music score recognition. We cooperate in a joint research project.

Math. formulae, off-line or on-line 4 Formulae recognition can be divided into two groups

Math. formulae, off-line or on-line 4 Formulae recognition can be divided into two groups by the type of input: 1. Off-line recognition – a formula is depicted in a raster image. 2. On-line recognition – a formula represented by a sequence of pen strokes (growing importance due to tablet PCs).

Math. formulae recognition, usage 5 § Off-line recognition – conversion of scanned printed mathematical

Math. formulae recognition, usage 5 § Off-line recognition – conversion of scanned printed mathematical texts into an electronic form. § On-line recognition – connected to penbased computing technologies (electronic tablets). There are many papers on formulae recognition, but only a few commercial products (e. g. , x. Math. Journal by x. Think)

Usual architecture 6 Two independent layers: § Symbol detection and recognition. § Structural analysis.

Usual architecture 6 Two independent layers: § Symbol detection and recognition. § Structural analysis. image, sequence of strokes symbol recognition symbols (+ coordinates and font size) error corrections (optional) structural analysis derivation tree

Symbol recognition methods 7 § Image segmentation + OCR tool. § Image segmentation and

Symbol recognition methods 7 § Image segmentation + OCR tool. § Image segmentation and character recognition performed simultaneously (e. g. , by Hidden Markov Models). • It is very difficult to recover from errors made in segmentation phase. • Semantic not taken into account.

Structural analysis methods 8 Grammar based • geometric grammars • graph grammars § Non-grammar

Structural analysis methods 8 Grammar based • geometric grammars • graph grammars § Non-grammar based • minimum spanning tree • hard-coded rules

Our approach to structural recognition 9 Based on general structural constructions by M. I.

Our approach to structural recognition 9 Based on general structural constructions by M. I. Schlesinger, V. Hlaváč in Ten Lectures on Statistical and Syntactic Pattern Recognition (Kluwer Academic Publishers, 2002) § Do not separate segmentation and parsing, perform them simultaneously. • Suitable for recognition of objects with rich structure. • Already successfully applied to music scores and electric circuits diagrams.

Structural Recognition – General Idea Assumptions: input image, set of derivation rules Recognition: 1.

Structural Recognition – General Idea Assumptions: input image, set of derivation rules Recognition: 1. Algorithm starts with regions labeled by terminals - squares corresponding to one symbol, - regions detected by an external tool. 2. Bigger regions labeled by non-terminals are derived by applying the rules, each derivation is assigned by a penalty. 3. Result: region matching the whole picture with the smallest penalty. N A B C D Region N is derived by a rule from regions A, B, C, D 10

Structural Recognition Applied on Formulae using 2 D Context-free Grammars 11 • • Uniform

Structural Recognition Applied on Formulae using 2 D Context-free Grammars 11 • • Uniform shapes of regions considered – rectangles 2 D grammar for mathematical formulae designed. • Terminals detection - detect all possible occurrences of elementary symbols using an OCR tool, evaluate the occurrences by a penalty (computed by the OCR tool). fraction line, minus sign symbol 5

Structural Recognition Applied on Formulae using 2 D Context-free Grammars 12 Parsing – let

Structural Recognition Applied on Formulae using 2 D Context-free Grammars 12 Parsing – let the structural analysis decide what is the best segmentation and interpretation of the elementary symbols, i. e. find derivation tree covering the whole image, evaluated by the smallest penalty. 5 2

Two-dimensional Context-free Grammars … set of terminals … set of non-terminals … initial non-terminal

Two-dimensional Context-free Grammars … set of terminals … set of non-terminals … initial non-terminal … set of productions Three basic types of productions in P: Generalized form of productions: 13

Interpretation of Productions 14 G generates pictures that can be named by the initial

Interpretation of Productions 14 G generates pictures that can be named by the initial non-terminal S

Theoretical Results on 2 D CF Languages 15 L(2 CFG). . . class of

Theoretical Results on 2 D CF Languages 15 L(2 CFG). . . class of languages that can be generated by a 2 D CF grammar • L(2 CFG) includes 1 D context-free languages • L(2 CFG) and L(2 FSA) are not comparable • There is no analogy to the Chomsky normal form of productions • Basic form of productions is weaker than general one • Emptiness problem is not decidable • Languages in L(2 CFG) can be recognized in polynomial time Observation: natural generalization, but the properties of L(2 CFG) differ to the properties of the class of 1 D context-free languages.

Recognition in Polynomial Time 2 D CF grammars with productions in the basic form:

Recognition in Polynomial Time 2 D CF grammars with productions in the basic form: 16 Generated languages can be recognized in time picture size (M. I. Schlesinger) Algorithm can be generalized on all languages in L(2 CFG) Maximal number of rows on the right-hand side of a production. Maximal number of columns on the right-hand side of a production. • degree of the polynomial depends on size of the productions

Extension of 2 D CF Grammars 17 2 D context-free grammar are not power

Extension of 2 D CF Grammars 17 2 D context-free grammar are not power enough to express complex structure of mathematical formulae. We need a formalism allowing to easily work with relative positions and sizes of symbols, e. g. to express relationships like “a symbol is superscript of another symbol”, etc. 5 3 1 5 3 + 2 6 4

Extension of 2 D CF Grammars § Regions are still rectangles. § Each derived

Extension of 2 D CF Grammars § Regions are still rectangles. § Each derived region is assigned by a feature point (logical center). The feature point a derived region is determined by the applied production. 1 5 3 18

Extension of 2 D CF Grammars § Usage of productions is not limited on

Extension of 2 D CF Grammars § Usage of productions is not limited on directly neighboring (touching) rectangles. § Productions can specify a rectangular area where some specific point of a rectangle has to be contained. § Position and sizes can be given relative to one of the rectangles. § Restrictions on relative sizes of rectangles are also possible. 5 3+2 19

Penalty Computation 20 Based on summing partial penalties determined by the following criterions: §

Penalty Computation 20 Based on summing partial penalties determined by the following criterions: § Used production. § Relative sizes and positions of regions the production is applied on (original regions). § Number of black pixels in the new region that are not in the original regions. § Penalty of the original regions.

Implementation of the Recognition Tool § § Off-line recognition. Implemented in Java. Trained and

Implementation of the Recognition Tool § § Off-line recognition. Implemented in Java. Trained and tuned for hand-written formulae. Black and white images (but can be extended on gray-scale images). § The following constructs are supported: • • variables, numbers, parenthesis, common unary and binary operators, power to operator, fractions, square root, subscripts, superscripts, sum, integral. § Can deal with noise, ambiguities, touching or split symbols, etc. and also with misplaced symbols. 21

Tool Architecture 22 OCR tool terminals detection 2 D grammar parsing

Tool Architecture 22 OCR tool terminals detection 2 D grammar parsing

Terminals Detection 23 Ideally, all regions should be scanned for an elementary symbol presence,

Terminals Detection 23 Ideally, all regions should be scanned for an elementary symbol presence, but this consumes much time, two smarter strategies implemented: • • Scanning rectangular windows of some predefined sizes (not all sizes). Detection based on connectivity components. Limitations of the method: overlaping symbols’ bounding boxes, symbols that intersect Used OCR tool: A simple method implemented - feature vector extracted from image, k-nearest neighbor classifier used to classify the vector. Trained for all supported elementary symbols.

Remarks on Terminals Detection 24 • • Symbols that do not have size limited

Remarks on Terminals Detection 24 • • Symbols that do not have size limited by a constant are not treated as terminal symbols (e. g. , fraction line, square root). In addition, square root cannot be separated from an image by a rectangle (it surrounds its argument). Solution: Treat these cases as symbols composed of several terminal symbols, extend grammar by related productions.

Parsing Algorithm 25 § Bottom up approach, as described in the general structural recognition.

Parsing Algorithm 25 § Bottom up approach, as described in the general structural recognition. § Complexity – depends on the number of terminals detected during the first phase; in general, can be exponential, but it is substantially reduced by production restristions and usage of suitable data structures § Data structures for orthogonal range queries (searching points that are located in a rectangle) used to speed up the algorithm.

Future Plans 26 § Focus on printed formulae § Collect sufficiently large set of

Future Plans 26 § Focus on printed formulae § Collect sufficiently large set of annotated printed formulae § Apply learning methods: learn etalons of elementary symbols and productions parameters