Optical Data Capture Optical Character Recognition OCR Intelligent

  • Slides: 21
Download presentation
Optical Data Capture: Optical Character Recognition (OCR) Intelligent Character Recognition (ICR) Intelligent Recognition Regional

Optical Data Capture: Optical Character Recognition (OCR) Intelligent Character Recognition (ICR) Intelligent Recognition Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

Summary o o o o Concept/Definition Forms Design Scanners & Software Storage Accuracy OCR/ICR

Summary o o o o Concept/Definition Forms Design Scanners & Software Storage Accuracy OCR/ICR Advantages and Disadvantages Intelligent Recognition (IR) Commercial Suppliers Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

Definition/Concept of OCR o Gives scanning and imaging systems the ability to turn images

Definition/Concept of OCR o Gives scanning and imaging systems the ability to turn images of machine printed characters into machine readable characters. n Images of the machine printed characters are extracted from a bitmap of the scanned image Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

Definition/Concept of ICR o Gives scanning and imaging systems the ability to turn images

Definition/Concept of ICR o Gives scanning and imaging systems the ability to turn images of hand written characters into machine readable characters n Images of the hand written characters are extracted from a bitmap of the scanned image Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

OCR and ICR Differences o OCR is less accurate than OMR but more accurate

OCR and ICR Differences o OCR is less accurate than OMR but more accurate than ICR o ICR will require editing to achieve high data coverage Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

Forms o OCR/ICR has less strict form design compared to OMR n No timing

Forms o OCR/ICR has less strict form design compared to OMR n No timing tracks n Has Registration Marks o ICR requires hand printed boxes filled one alphanumeric character per box Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

OCR o Forms n OCR/ ICR is more flexible since: o no timing tracks

OCR o Forms n OCR/ ICR is more flexible since: o no timing tracks are required o The image can float on a page n The use of drop color reduces the size of the scanner’s output and enhances the accuracy n ICR/OCR technology often uses registration mark on the four -corners of a document, in the recognition of an image Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards,

Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

OCR/ICR Scanners and Software o Forms can be scanned through a scanner and then

OCR/ICR Scanners and Software o Forms can be scanned through a scanner and then the recognition engine of the OCR/ICR system interpret the images and turn images of handwritten or printed characters into ASCII data (machine-readable characters). o Users can scan up without doing the OCR o Speeds Range from: 85 -160 sheets/min (dependent on the recognition engine) Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

OCR/ICR Storage Characteristics o Storage/Retrieval o Images are scanned and stored and maintained electronically

OCR/ICR Storage Characteristics o Storage/Retrieval o Images are scanned and stored and maintained electronically o There is no need to store the paper forms as long as you safeguard the electronic files o With OCR/ICR technologies, images can be scanned, indexed, and written to optical media Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

Ideal OCR/ICR Accuracy Thresholds o Accuracy: n Accuracy achieved by data entry clerks (~99.

Ideal OCR/ICR Accuracy Thresholds o Accuracy: n Accuracy achieved by data entry clerks (~99. 5%) are approximately equal to OCR/ICR in in perfect tuning (~99. 5%) n Up to 99. 9% accuracy with editing (like OMR) o The recognition engine must be tuned, tested and validated very carefully Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

OCR/ICR Advantages o Recognition engines used with imaging can capture highly specialized data sets

OCR/ICR Advantages o Recognition engines used with imaging can capture highly specialized data sets o OCR/ICR recognize machine-printed or hand-printed characters. o Scanning and recognition allowed efficient management and planning for the rest of the processing workload o Quick retrieval for editing and reprocessing Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

OCR/ICR Disadvantages o Technology is costly o May require significant manual intervention o Additional

OCR/ICR Disadvantages o Technology is costly o May require significant manual intervention o Additional workload to data collectors -ICR has severe limitations when it comes to human handwriting o Characters must be hand-printed/machine-printed with separate characters in boxes o ineffective when dealing with cursive characters Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

OMR-OCR/ICR Compared Regional Workshop on the 2010 World Programme on Population and Housing Censuses:

OMR-OCR/ICR Compared Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

OCR/ICR Challenges/Issues o Has corresponding issues with OMR o Algorithm development (Preparation of memory

OCR/ICR Challenges/Issues o Has corresponding issues with OMR o Algorithm development (Preparation of memory dictionary) o Processing time considerations due to recognition engine o Development costs Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

Definition/Concept of IR State of the art recognition technology o Gives scanning and imaging

Definition/Concept of IR State of the art recognition technology o Gives scanning and imaging systems the ability to turn images of hand written and cursive characters into machine readable characters o Images of the hand written and cursive characters are extracted from a bitmap of the scanned image o The ability to capture cursive make this method unique Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

Definition/Concept of IR o eight elements that make up the trajectories of all cursive

Definition/Concept of IR o eight elements that make up the trajectories of all cursive letters (figure 1) Photo: Parascript LLC Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

Definition/Concept of IR o Intelligent Recognition dynamically uses context o context is used during

Definition/Concept of IR o Intelligent Recognition dynamically uses context o context is used during the recognition process, improving the accuracy of results o Contexts helps to identify letters where the symbol segmentation of an image is ambiguous Photo: Parascript LLC Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

Technology Evolution FORM TYPES TEXT STYLES No special form design No constraining boxes or

Technology Evolution FORM TYPES TEXT STYLES No special form design No constraining boxes or combs Condensed strings Dirty & Noisy forms Bad quality paper Legacy Forms Cursive Bad quality machine print Unconstrained Handprint Specially designed for automatic recognition Constrained Handprint Constraining boxes or combs Drop out ink for preprinted text & boxes Machine Print OCR Intelligent Recognition TECHNOLOGY EVOLUTION Illustration: Conference on Technology Options for 2011 Census Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

Major Commercial Suppliers o Top Image Systems (TIS) (http: //www. topimagesystems. com) o Read.

Major Commercial Suppliers o Top Image Systems (TIS) (http: //www. topimagesystems. com) o Read. Soft (http: //www. readsoft. com) o Teleform (http: //www. intelliscan. com/Tele. Form 1. htm) o Scanner Suppliers n Fujitsu, Canon, Bell & Howell, Kodak Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008

THANK YOU! Regional Workshop on the 2010 World Programme on Population and Housing Censuses:

THANK YOU! Regional Workshop on the 2010 World Programme on Population and Housing Censuses: International standards, contemporary technologies for census mapping and data processing Minsk, Belarus, 8 -12 December 2008