Approaches to Recognition for Telugu Script Fringe Distances

  • Slides: 12
Download presentation
Approaches to Recognition for Telugu Script: Fringe Distances and Other Methods Atul Negi and

Approaches to Recognition for Telugu Script: Fringe Distances and Other Methods Atul Negi and Ravi Raj Singh Department of Computer and Information Sciences University of Hyderabad

Optical Character Recognition • OCR is an enabling technology for languages • Potential for

Optical Character Recognition • OCR is an enabling technology for languages • Potential for office automation and content creation activities • Transform paper media into searchable and computer revisable text format

OCR in Indian Scripts -I • Complete Bangla OCR (Pal and Chaudhuri 1999) •

OCR in Indian Scripts -I • Complete Bangla OCR (Pal and Chaudhuri 1999) • Devnagari and Bangla OCR (Chaudhuri et al 2000) • Devanagari OCR (Bansal and Sinha 1999)

OCR in Indian Scripts -II • Gurmukhi OCR (Lehal et al 2000) • Tamil

OCR in Indian Scripts -II • Gurmukhi OCR (Lehal et al 2000) • Tamil OCR ( AG Ramakrishna 2002, V Krishnamoorthy 2001) • Kannada OCR (PS Sastry 1999, AG Ramakrishna 2002)

OCR Efforts in Telugu Recognition Approaches • Rajasekharan and Deekshatulu 1977 • Sukhswami, Seetharamulu

OCR Efforts in Telugu Recognition Approaches • Rajasekharan and Deekshatulu 1977 • Sukhswami, Seetharamulu , and Pujari 1995 • Negi, Chakravarthy and Krishna 2001 • Pujari et al 2002 • Vasantha and Patvardhan 2002

Telugu Script Difficulties • Rounded Nature • Lack of vertical strokes • Placement of

Telugu Script Difficulties • Rounded Nature • Lack of vertical strokes • Placement of ottulu and matra • Merging of glyphs • Fonts with varied stroke thickness

Fringe Distances • Template Matching Approach • Needs storage of Fixed Size Templates •

Fringe Distances • Template Matching Approach • Needs storage of Fixed Size Templates • Connected components (CC) isolated from lines and words in the image • Each input CC is scaled to Template size

Fringe Formation Fringes are point distances to image pixels stored for each training template

Fringe Formation Fringes are point distances to image pixels stored for each training template character. Fringes speed up template matching.

Run Number Based Metric Distance • Technique used for Bangla/Devanagari • JH(X, Y) =

Run Number Based Metric Distance • Technique used for Bangla/Devanagari • JH(X, Y) = Σ |n. X(i) – n. Y(i)| (1) • J(X, Y) = JH(X, Y) + JV(X, Y) (2) • Invariant to scaling • Robust to font style changes

Recognition Results Accuracy Comparison Data • Five image files • More than 5000 input

Recognition Results Accuracy Comparison Data • Five image files • More than 5000 input glyphs

Recognition Time • Run on Linux system Pentium 3 system

Recognition Time • Run on Linux system Pentium 3 system

Conclusions • Accuracy for fringe distances is marginally superior • Time taken by RNM

Conclusions • Accuracy for fringe distances is marginally superior • Time taken by RNM is marginally superior • Further testing required on very varied fonts styles to make definite statements Thank You