Character Recognition of MODI Script using Distance Classifier

  • Slides: 32
Download presentation
Character Recognition of MODI Script using Distance Classifier Algorithms Presented by Solley Joseph Thomas

Character Recognition of MODI Script using Distance Classifier Algorithms Presented by Solley Joseph Thomas Ph. D Scholar - Christ (Deemed to be University- Bangalore) Associate Professor ( Carmel College for Women- Goa) Dr. Jossy P. George (Research Supervisor ) Christ (Deemed to be University) Suhas Gaikwad Christ (Deemed to be University)

OUTLINE ● ● ● Introduction Objective Methodology Experimental Study Result and Discussion Conclusion and

OUTLINE ● ● ● Introduction Objective Methodology Experimental Study Result and Discussion Conclusion and Future Plan 2

INTRODUCTION ● Optical Character Recognition (OCR) is a branch of pattern recognition and computer

INTRODUCTION ● Optical Character Recognition (OCR) is a branch of pattern recognition and computer vision. ● HOCR - process of recognizing handwritten text from document images. ● The advances in various machine learning techniques have made a drastic increase in OCR related research. ● Character Recognition is successfully implemented for various foreign languages. ● Character Recognition of Indian script is comparatively difficult due to complex nature of the script. 3

INTRODUCTION… ● MODI is an ancient script of India. ● Shorthand form of the

INTRODUCTION… ● MODI is an ancient script of India. ● Shorthand form of the Devanagari. ● Used for writing ‘Marathi’ in cursive and was intended for continuous writing. ● Developed in ‘Devagiri’ in the 12 th century and was in use till 1950. ● Difficulty in printing the script lead to the downfall of the script. ● Urdu, Kannada, Gujarati, Rajasthani, Hindi, and Tamil were also written using MODI script 4

INTRODUCTION … Characteristics of the Script ● Written using ‘Boru’ or ‘Lekhan’. ● Before

INTRODUCTION … Characteristics of the Script ● Written using ‘Boru’ or ‘Lekhan’. ● Before the commencement of writing draw horizontal line (Shirorekha). ● No punctuation marks. ● No particular termination symbol for words. ● Calligraphic uniformity 5

INTRODUCTION … Character set of MODI script 10 Vowels and 36 Consonants were used

INTRODUCTION … Character set of MODI script 10 Vowels and 36 Consonants were used in MODI Script 6

 INTRODUCTION …. . DESCRIPTION OF DATA SET Calligraphic Challenges 7

INTRODUCTION …. . DESCRIPTION OF DATA SET Calligraphic Challenges 7

INTRODUCTION … DESCRIPTION OF DATA SET - Calligraphic challenges 8

INTRODUCTION … DESCRIPTION OF DATA SET - Calligraphic challenges 8

INTRODUCTION … Significance of the Script ● Plenty of ‘MODI’ documents are preserved in

INTRODUCTION … Significance of the Script ● Plenty of ‘MODI’ documents are preserved in Libraries & temples etc. - Tanjavur's Saraswati Mahal - Bharat Itihas Sanshodhan Mandal, Pune (BISM) - Rajwade Sanshodhan Mandal, Dhule (Maharashtra) - State Archive Department, Pune & Museums in London, Paris & Spain ● CDAC Mumbai has initiated work towards the revival of script by developing a MODI Script learning app called “���� ”. ● Script Encoding Initiative (SEI) of the University of California, completed a project to encode the script. ● Tamil University has taken steps towards digitization and publishing 'MODI' documents. 9

OBJECTIVES OF THE STUDY To implement character recognition for MODI script using 2 classification

OBJECTIVES OF THE STUDY To implement character recognition for MODI script using 2 classification algorithms 10

 METHODOLOGY- THEORY OF CHARACTER RECOGNITION PROCESS 12

METHODOLOGY- THEORY OF CHARACTER RECOGNITION PROCESS 12

METHODOLOGY ● In the first step, vectorization is used to convert the raster image

METHODOLOGY ● In the first step, vectorization is used to convert the raster image to the twodimensional vector representation of the image. ● Vectorization is an import part of graphics recognition. It deals with converting the scanned image to a vector form that is appropriate for further processing and analysis. ● Then, the noise reduction technique is applied to the vectorized data. The noise removal algorithms reduce or remove the visibility of noise by smoothing the entire image. 14

 METHODOLOGY …. ● The next step is classification. The classification stage is instrumental

METHODOLOGY …. ● The next step is classification. The classification stage is instrumental in mapping an unknown sample into a predefined class. ● Various methodologies of pattern recognition can be used at this stage for the classification purpose. BPNN, K-Nearest Neighbour, Support Vector Machine and Euclidean Distance classifier are the commonly used methods. ● The proposed experiment uses two algorithms for the recognition of characters. The algorithms used for the classification and recognition of MODI script are Euclidean distance classifier and Manhattan distance classifier. 15

 Experimental Study The data set: ● Data set used for character recognition consists

Experimental Study The data set: ● Data set used for character recognition consists of 100 samples each of 46 characters of MODI script. ● Thus the total data set comprises of 4600 MODI characters which are divided as train and test data set (70: 30 ratio). 16

Experimental Study Vectorization ● In the first step, vectorization is performed on the data.

Experimental Study Vectorization ● In the first step, vectorization is performed on the data. Noise reduction ● Noise reduction technique is applied to the vectorized data. Classification ● In the classification process, two advanced algorithms are implemented using Distance classifer. ● Euclidian Distance and Manhattan distance classifier are used for the classification purpose. 17

Result and Discussion The results obtained from both the algorithms are in a similar

Result and Discussion The results obtained from both the algorithms are in a similar bandwidth Euclidean distance classifier ● The average accuracy obtained using of Euclidean distance classifier is 99. 13. Manhattan distance classifier ● With Manhattan distance classifier the average accuracy is improved to 99. 27. ● It is also observed that the time complexity is reduced with Manhattan distance Classifier 21

Performance of the classification Algorithms ● 4600 MODI characters Training set of 3220 Samples

Performance of the classification Algorithms ● 4600 MODI characters Training set of 3220 Samples 1380 Test sample Euclidian Distance Manhattan Distance TP TN RR 1368 12 99. 13 1370 10 99. 27 23

CONCLUSION AND FUTURE PLAN ● MODI script character recognition is still in infancy ●

CONCLUSION AND FUTURE PLAN ● MODI script character recognition is still in infancy ● It needs a different approach as compared to other Indian script, because of various factors like the shape similarity of characters and inconsistency in writing style. ● As a future plan Genetic algorithm will also be explored for MODI character recognition as they it expected to give precise classification. ● Segmentation is one of the most difficult area which needs attention in the case of MODI manuscript. Segmentation can also lead to the character recognition of MODI scripts from the olden manuscripts. 26

References : 1. Anshuman Pandey, Final Proposal to Encode the MODI Script, 2011. 2.

References : 1. Anshuman Pandey, Final Proposal to Encode the MODI Script, 2011. 2. Rajendra Thakre. Reviving MODI Script, 2014. 3. Rajesh Khillari. History of MODI Script, 2008. 4. Priyanka Sharma. Classification in Pattern Recognition : A Review, International Journal of Computational Engineering Research (IJCER), 2013. 5. Cheng-lin Liu and Hiromichi Fujisawa. Classification and Learning Methods for Character Recognition : Advances and Remaining Problems. Studies in Computer Intelligence, Springer Berlin, Heidelberg, 2008. 6. A. K. Jain, R. P W Duin, and Jianchang Mao. Statistical pattern recognition: a review. IEEE Transactions , 2000. 7. A Ramteke and G Katkar. Recognition of On line Modi Script : A Structure Similarity Approach. - International Journal ICT and management, 2013. 8. Sidra Anam. An Approach for Recognizing Modi Lipi using Otsu's Binarization Algorithm and Kohenen Neural Network. International Journal of Computer Applications 2015. 27

References … 9. D N Besekar. Special Approach for Recognition of Handwritten MODI Script

References … 9. D N Besekar. Special Approach for Recognition of Handwritten MODI Script Vowels. International Journal of Computer Applications, 2012. 10. D N Besekar. Study for Theoretical Analysis of Handwritten MODI Script. A Recognition Perspective. International Journal of Computer Applications, 2013 11. Snehal Rathi. Recognition and Conversion of Handwritten MODI Characters. International Journal of Technical Research and Applications, 2015. 12. K Sadanand, P. L Borde, M Ramesh, and Yannawar Pravin, Online MODI Character Recognition Using Complex Moments. Procedia Computer Science 2015. 13. S Kulkarni, P Borde, Ramesh M , and Pravin Yannawar. Analysis Of Orthogonal Moments For Recognition Of Handwritten MODI Numerals. VNSGU JOURNAL OF SCIENCE AND TECHNOLOGY, 2015. 14. S Kulkarni, P Borde, Ramesh M , and Pravin Yannawar. Impact of zoning on Zernike moments for handwritten MODI character recognition. IEEE , 2015. 28

References … 15. P Malviya and M Ingle. Feature Extraction and Classification Techniques in

References … 15. P Malviya and M Ingle. Feature Extraction and Classification Techniques in Character Recognition Systems-A Comparative Study. Springer 2018, 16. Padma Ramkrushna Bagde and Ajay Anil Gurjar. A handwritten recognition for free style Marathi script using genetic algorithm. In 2016 International Conference on Global Trends in Signal Processing, Information Computing and Communication, IEEE, 2016. 17. Hiral Modi and M C Parikh. A Review on Optical Character Recognition Techniques. International Journal of Computer Applications, 2017. 18. Madhuri Yadav and Ravindra Kumar Purwar. Hindi handwritten character recognition using oriented gradients and hu-geometric moments. Journal of Electronic Imaging, 2018. 19. H Karabork and I O Bildirici. A Neural Network Algorithm for Vectorization of 2 D Maps. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2008. 20. Abdalla Mohamed Hambal, Zhijun Pei, and Faustini Libent Ishabailu. Image Noise Reduction and Filtering Techniques. International Journal of Science and Research , 2017. 29

THANK YOU 30

THANK YOU 30

31

31

EXTRA SLIDES 32

EXTRA SLIDES 32

33

33

Sample manuscript Property paper of Mr. Pandurang Sutar ( Satara) 34

Sample manuscript Property paper of Mr. Pandurang Sutar ( Satara) 34

Challenges in Segmentation 35

Challenges in Segmentation 35

36

36

Calligraphic challenges 37

Calligraphic challenges 37

Calligraphic challenges 38

Calligraphic challenges 38

Text book style 39

Text book style 39

for vectorisation i have used Matlab function k=imread('20. jpg'); gray('k'); b=k; b=ans; 40

for vectorisation i have used Matlab function k=imread('20. jpg'); gray('k'); b=k; b=ans; 40