Handwriting Removal from Document Images Computer Vision Lab
Handwriting Removal from Document Images Computer Vision Lab Hyeonseob Nam
Overview • Introduction –Motivation & Goal –Related Works • Proposed Method –Image binarization –Character segmentation –Classification of machine-printed/handwritten text –Touching-character separation –Text Line segmentation • Results
Motivation • Scanned document images usually contain annoying handwritten stuff • Harmful in various fields: document recognition, search and retrieval, historic document analysis, etc.
Goal • Remove handwritings from a document image and restore the original machine-printed image
Framework Image Binarization Connected Components Extraction Dimensionality Reduction Local Threshold Learning Testing Touching-character Separation Classification Text Line Segmentation
Image Binarization
Image Binarization • Maximize between-class variance • Minimize within-class variance Gray-level histogram
Block Segmentation • Connected-Component(CC) Analysis
Classification • Classify the CCs into Machine-printed or Hand-written
Classification - dimensionality reduction • Project CCs into a hyper-dimensional character space
Classification • Local distance threshold learning training
Classification - training • Local distance threshold learning Find the optimal radius for each template, such that - maximize the cdf for machine-printed samples - minimize the cdf for handwritten samples
Classification - testing • Determine whether given CC is in the local threshold
Results
Touching-character Problem • Segmentation failure in low-resolution images
Touching-character Separation Vertical Projecton: v(x) Peak-to-Valley function: Pv(x) = {v(x-1)-2*v(x)+v(x+1)}/v(x)
Touching-character Separation • Recursive Touching-character Separation & Classification
Results Original Initial classification Final result
Results Original Initial classification Final result
Results Original Initial classification Final result
Handling Overlapped Cases
Handling Overlapped Cases • Text Line Segmentation –Projection-based Method
Text Line Segmentation
Text Line Segmentation
Text Line Segmentation • Before
Text Line Segmentation • After
Results
Results
Results
Results
Limitations • Cannot recover pure machine-printed text • Cannot handle an isolated character
Thank you
- Slides: 32