Project Tukaram Sagar Tamhane Centre for Indian Language
- Slides: 22
Project Tukaram Sagar Tamhane Centre for Indian Language Technology Solutions IIT Bombay 12 June 2002 Center For Indian Language Technology Solutions 1
12 June 2002 Center For Indian Language Technology Solutions 2
The Goal • To make Saint Tukaram’s Abhangas available over web for browsing and searching • Locate the right Abhangas that you need. • Present the pages to the user in an order of importance. 12 June 2002 Center For Indian Language Technology Solutions 3
The Source • The Abhangas are typed from a book called “Ea. I tukaramabaavaa. Mcyaa ABa. Mgaa. Mca. I gaaqaa” published on 6 th November 1973 by the Govt. of Maharashtra • Previous editions: 1950 and 1955. • Number of Abhangas: Center For Indian 4644 Language 12 June 2002 Technology Solutions 4
Creation of Web Content • Software used for typing: MS Word with Akruti_Priya_Expanded font and Akruti keyboard driver • Problems faced: – Non displayable characters Eg: This was typed as mna • Automated page splitting 12 June 2002 Center For Indian Language Technology Solutions 5
Converters Used • Akruti_Priya_Expanded ISCII converter: required for indexing the text • ISCII Monolingual ISFOC converter: required for displaying the text through DV-TTYogesh • XDVNG ISCII: for query strings to ISCII 12 June 2002 Center For Indian Language Technology Solutions 6
Technologies used for the Tukaram Search Engine • Input Technology: – Jtrans: XDVNG font • Keyboard Mapping: – Phonetic English • Result Display at client: – ISFOC • Encoding for indexing (storage): – ISCII 12 June 2002 Center For Indian Language Technology Solutions 7
Architecture 12 June 2002 Center For Indian Language Technology Solutions 8
Input Technology 12 June 2002 Center For Indian Language Technology Solutions 9
Components of the Search Engine • Index – Case sensitive ISCII – Database structure • Searcher – In-memory search – Algorithm: Hybrid of Hashing & Binary search 12 June 2002 Center For Indian Language Technology Solutions 10
Database Structure 12 June 2002 Center For Indian Language Technology Solutions 11
• Snap shot of result 12 June 2002 Center For Indian Language Technology Solutions 12
Relevancy Criteria • • Number of query words in the abhang Position Adjacency Total number of words in the abhang 12 June 2002 Center For Indian Language Technology Solutions 13
12 June 2002 Center For Indian Language Technology Solutions 14
12 June 2002 Center For Indian Language Technology Solutions 15
12 June 2002 Center For Indian Language Technology Solutions 16
12 June 2002 Center For Indian Language Technology Solutions 17
12 June 2002 Center For Indian Language Technology Solutions 18
12 June 2002 Center For Indian Language Technology Solutions 19
12 June 2002 Center For Indian Language Technology Solutions 20
General information • • • Number of abhangas : 4, 644 Total number of words : 2, 09, 702 Number of distinct words : 34, 773 Languages used for converters: Lex & C Language used for search engine: Java 2 Scripting on client side : Java. Script 12 June 2002 Center For Indian Language Technology Solutions 21
Thank You 12 June 2002 Center For Indian Language Technology Solutions 22
- Sant tukaram vidyalaya dehu
- Ravi raj sagar
- Chaki 1
- Sagar institute of research and technology
- Heather sagar noaa
- Sagar kamarthi
- Hongyi zhu
- Sagar chaki
- Shg group names
- Leading foreign recruitment pvt ltd
- Sagar dhekne
- Maymaygwayseeuk
- Centroid engineering
- Centroid
- Kontinuitetshantering i praktiken
- Novell typiska drag
- Tack för att ni lyssnade bild
- Returpilarna
- Varför kallas perioden 1918-1939 för mellankrigstiden?
- En lathund för arbete med kontinuitetshantering
- Särskild löneskatt för pensionskostnader
- Tidböcker
- A gastrica