Towards a LanguageIndependent Universal Digital Library The Second
Towards a Language-Independent Universal Digital Library The Second International Conference on Universal Digital Libraries (ICUDL 2006) 17 -19 -2006 November, Alexandria, Egypt Sameh Alansary Magdy Nagi Noha Adly sameh. alansary@bibalex. org magdy. nagi@bibalex. org noha. adly@bibalex. org Bibliotheca Alexandrina
Introduction • IT made the full text libraries’ assets available digitally (Independent of time, place and copy). e. g. - Million Book Project. - Nasser Digital Library. UDL • Digitization only does not lead to “universality” in its optimum sense. • A new dimension of universality should be added: Independency of Language
Language-dependency blocks information dissemination • Language dependency holds language barriers. • If it is always possible for everyone to read in everyone’s mother tongue, this will help in: - Dissemination of knowledge. - Preservation of nationality and identity. - Preventing cultural hegemony. • 80% of books and e-materials is written in English and 20% is written in other languages.
Attempts to break language barriers • Translation systems have been introduced (NLP): Approaches: 1 - Direct translation approach. 2 - Transfer approach. 3 - Interlingual approach. Examples of Systems: - Google translation: http: //www. google. ch/language_tools - Fujitsu systems: http: //www. fujitsu. com/global/services/translation
Drawback of MT systems 1 - The quality of results is often inadequate. 2 - Work for a limited number of language combinations. 3 - Hold an overload on the network: To translate from and to only 10 languages, 10 grammars, 10 lexicons, 90 translation dictionaries and 90 sets of translation rules will be needed, plus the need for semantic processing in each language.
Towards a universal system for knowledge representation
Some questions may bear in mind: • How can we represent natural language materials in a language independent format? (a format required) • What is the system suitable for representing knowledge in the format selected? (a system required) • How is this system going to work?
Requirements for a universal representation of knowledge: 1 - The content of the original material (meaning) must not be lost. 2 - This universal format should be understandable by various platforms over the network. 3 - This universal format should be decodable to any natural language.
UNL System
What is UNL? (1) • The Universal Networking Language (UNL) is an artificial language for computers to express information and knowledge that can be expressed in natural language. • Started in 1996, as an initiative of the UNU/IAS in Japan • R&D in UNL - Development on 15 languages: Arabic, Chinese, English, French, German, Hindi, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Thai, Swahili. - Transferred to the UNDL Foundation in 2001.
What is UNL? (2) • It expresses information or knowledge of natural language (NL) in the form of semantic network with hyper-node. Example: The boy who works here went to school UNL expression: {UNL} agt(go(icl>move). @entry. @past, : 01) plt(go(icl>occur). @entry. @past, school(icl>institution)) agt: 01(work(icl>do), boy(icl>person. @entry)) plc: 01(work(icl>do), here) {/UNL}
The boy who works here went to school go(icl>move) agt @ entry @ past plt boy(icl>perso n) @ entry agt here school(icl>institutio n) plc work(icl> do) : 01 UNL-hyper graph
The UNL System Components Formalism Knowledge representation
The UNL-system components UNL LANGUAGE SERVER Enconverter = Deconverter (En. CO) Language Server UNL <- >Chinese En. CO De. CO UNL document Language Server UNL Editor UNL <-> Arabic UNL Viewer UNL Proxy Internet USER 1 Language Server UNL <-> Spanish 2 3 Language Server UNL <-> Hindi Language Server UNL <- >Japanese En. CO De. CO Language Server UNL <- > English En. CO De. CO
A) Language servers: Analysis Rules Web Server with UNL document En. Converter UNL-language Dictionary UNL NL Natural Language UNL Language Server Knowledge Base De. Converter Generation Rules Concurrence Dictionary
B) UNL Tools: 1 - UNL viewer. 2 - UNL editor. 3 - UNL verifier. C) UNL Proxy Server: • Searches for UNL at the web, send it to the language server and displays it on the user’s chosen language.
Mechanism of conversion between NL and UNL Annotation Editor Natural Language texts Annotated Natural Language texts UNL Verifier En. Converter Co. Grammatical Word Occurrence Rules Dictionary Natural Language texts Universal Parser De. Converter UNL Document UNL KB UW Dictionary Web server HTML+XML UNL Document
UNL as a formal language: How does it represent knowledge? 1 - Universal words (UW): to represent concepts. Example: boy(icl>person) hear(icl>perceive(agt>person, obj>thing)) 2 - relations: 38 semantic relations can be distinguished. Example: agt, aoj, bas, con, coo, dur, … etc. 3 - Attributes: to express subjectivity of the speaker. Example: @past, @emphasis, @def, @not, … etc.
4 - Knowledge base (UNLKB). • Define the Universal Word. • Provide linguistic knowledge of concepts
Ibrahim Shihata UNL Arabic Center (ISUAC) • It is established at Bibliotheca Alexandrina. • It is responsible for designing, implementing, and maintaining the various components of the Arabic language server. • The Arabic language server will be capable of: - Enconverting the Arabic texts to the universal format. - Deconverting the universal materials produce by other language centers to Arabic.
The Achievements of the ISUAC A) Arabic language resources and tools. B) Developing tools. C) Arabic language-based universal materials.
A) Arabic language resources and tools: 1 - The Arabic Dictionary: It is a repository of information for all UNL Arabic grammars. Head Words (Vocabulary of Arabic) Universal words (Vocabulary of UNL) Linguistics Features (Linguistic info about HWs) Dictionary
2 - Arabic En. Conversion Rules: • It is responsible for Enconverting Arabic to UNL. • Arabic En. Conversion Rules are able to: 1 - Perform morphological analysis to extract concepts the Arabic words refer to. 2 - Assign exact semantic relation between concepts as being expressed in the context of the Arabic sentence.
UNL Network:
3 - Arabic De. Conversion Rules: • It is responsible for generating Arabic sentences out of UNL networks. • Arabic De. Conversion Rules are able to: 1 - Select Arabic words that represent universal concepts. 2 - Arrange the concepts of the UNL network in a syntactically well-formed sentence.
• Simulation of how the Deconverter works Egypt obj description(icl> action) aoj collaboration(icl>action) agt 150 bas More (aoj>thing) aoj scientist(icl>scho lar). @entrry aoj and prominent(aoj>thing) gol Egypt 1798 tim Bonaparte(iof>person) mod outcome(icl>resul). @entry scholar(icl>person) accompany(agt>thi ng, obj>thing) agt j ob ﺇﻟﻰ ﻣﺼﺮ 1798 ﺑﺎﺣﺚ ﻭ ﻋﺎﻟﻢ ﻣﺮﻣﻮﻕ ﺍﻟﺬﻳﻦ ﺻﺎﺣﺐﻭﺍ ﺑﻮﻧﺎﺑﺮﺕ ﻓﻲ 150 ﻭﺻﻒ ﻣﺼﺮ ﻣﺤﺼﻞﺓ ﺗﻌﺎﻭﻥ ﺃﻜﺜﺮ ﻣﻦ ﺇﻟﻰ ﻣﺼﺮ 1789 ﺑﺎﺣﺚ ﻭ ﻋﺎﻟﻢ ﻣﺮﻣﻮﻕ ﺍﻟﺬﻳﻦ ﺻﺎﺣﺒﻮﺍ ﺑﻮﻧﺎﺑﺮﺕ ﻓﻲ 150 ﻭﺻﻒ ﻣﺼﺮ ﻣﺤﺼﻠﺔ ﺗﻌﺎﻭﻥ ﺃﻜﺜﺮ ﻣﻦ
4 - A Corpus for Modern Standard Arabic: • A representative sample (100 Millions) that reflects the empirical usage of Modern Standard Arabic. • It plays a principle role in enhancing and updating both En. Conversion and De. Conversion rules.
B) Developing tools: 1 - Integrated Development Environment (IDE)
2 - Corpus analysis software (GATE)
C) Arabic language-based universal materials. Library of Alexandria: the Fourth Pyramid. The Encyclopaedia of Famous Persons Abou Simple: The Temple of the Sun. Nasser Digital Library
An example of an Arabic Sentence in UNL (universal) format
ﺍﻷﻜﺒﺮ ﻟﻌﺒﺪ ﺍﻟﻨﺎﺻﺮ ﺣﺴﻴﻦ ﺍﻟﺬﻱ ﻭﻟﺪ ﻓﻲ ﻋﺎﻡ ﺍﻻﺑﻦ ﻭﻛﺎﻥ ﺟﻤﺎﻝ ﻋﺒﺪ ﺍﻟﻨﺎﺻﺮ ﻭﻟﻜﻨﻪ ﺣﺼﻞ ﻋﻠﻰ ، ﺍﻟﻔﻼﺣﻴﻦ ﻓﻲ ﻗﺮﻳﺔ ﺑﻨﻲ ﻣﺮ ﻓﻲ ﺻﻌﻴﺪ ﻣﺼﺮ ﻓﻲ ﺃﺴﺮﺓ ﻣﻦ 1888 ﻭﻛﺎﻥ ﻣﺮﺗﺒﻪ ، ﺑﺎﻹﺳﻜﻨﺪﺭﻳﺔ ﻗﺪﺭ ﻣﻦ ﺍﻟﺘﻌﻠﻴﻢ ﺳﻤﺢ ﻟﻪ ﺑﺄﻦ ﻳﻠﺘﺤﻖ ﺑﻮﻇﻴﻔﺔ ﻓﻲ ﻣﺼﻠﺤﺔ ﺍﻟﺒﺮﻳﺪ {unl}. ﻳﻜﻔﻲ ﺑﺼﻌﻮﺑﺔ ﻟﺴﺪﺍﺩ ﺿﺮﻭﺭﺍﺕ ﺍﻟﺤﻴﺎﺓ aoj(son(icl>person): 0 I. @def. @entry, Gamal Abdel Nasser(iof>person): 00) mod(son(icl>person): 0 I. @def. @entry, Abd El-Naser Hosen(iof>person): 23. @topic) aoj(old(aoj>thing): 1 J, son(icl>person): 0 I. @def) man(old(aoj>thing): 1 J, most(icl>how): 15) obj(born(obj>thing): 31. @past, Abd El-Naser Hossain(iof>person): 23. @topic) and(get(agt>thing, obj>thing): 6 S. @past. @contrast, born(obj>thing): 31. @past) scn(born(obj>thing): 31. @past, family(icl>group): 5 Q) plc(born(obj>thing): 31. @past, village(icl>region): 4 D) tim(born(obj>thing): 31. @past, year(icl>period): 3 M) mod(year(icl>period): 3 M, 1888: 41) plc(village(icl>region): 4 D, upper Egypt(iof>place): 58) mod(village(icl>region): 4 D, Bani Morr(iof>village): 4 S) mod(family(icl>group): 5 Q, farmer(icl>person): 65. @pl. @def) obj(get(agt>thing, obj>thing): 6 S. @past. @contrast, degree(icl>abstract thing): 7 N) agt(allow(agt>thing, gol>thing, obj>thing): 8 M. @past, degree(icl>abstract thing): 7 N) mod(degree(icl>abstract thing): 7 N, education(icl>activity): 82. @def) gol(allow(agt>thing, gol>thing, obj>thing): 8 M. @past, join(agt>person, obj>thing): 9 I. @present) obj(allow(agt>thing, gol>thing, obj>thing): 8 M. @past, his(pos>he): 97) and(suffice(aoj>thing, obj>thing): CM. @present, join(agt>person, obj>thing): 9 I. @present) obj(join(agt>person, obj>thing): 9 I. @present, job(icl>work): A 7) plc(job(icl>work): A 7, postal service{icl>service ): AN) plc(postal service{icl>service ): AN, Alexandria(iof>city): BB) aoj(suffice(aoj>thing, obj>thing): CM. @present, salary(icl>money): BV) mod(salary(icl>money): BV, his(pos>he): CB) obj(suffice(aoj>thing, obj>thing): CM. @present, satisfy(agt>thing, obj>thing): DQ) man(suffice(aoj>thing, obj>thing): CM. @present, hardly: DA) obj(satisfy(agt>thing, obj>thing): DQ, demand(icl>wants): E 6. @pl. @def) mod(demand(icl>wants): E 6. @pl. @def, life(icl>activity): EV. @def) {/unl} Language Independent Format
Is it going to work this way? !! • Are there language servers ready to work? • Are the universal materials deconvertable to other languages? What about Arabic? ? • Is the Arabic language server able to enconvert Arabic texts to universal format? • Is it also able to deconvert the universal materials back to Arabic?
A proof of the concept
UNL-based Library Information System (UNL-LIS) • It is a system to search in a digital library catalogs. • It is built on the UNL KI, therefore: - Query is in Natural Language (two languages) -Answer is also in Natural Language (7 languages)
UNL LIS Core Architecture User Question Language Server Enco rules + Dic Question in NL Enconversion Process LIS MARC 21 Records Question in UNL KB Answer in UNL Deconversion Process Answer in NL Query Engine Encyclopedia Concepts Definitions Language Server Deco rules + Dic MARC 21 Importing Process
Demo: Screen Shots
{unl} agt(begin(agt>thing, obj>action): 12. @past. @entry, Naguib Mahfouz(iof>person): 0 N. @topic) obj(begin(agt>thing, obj>action): 12. @past. @entry, writing(icl>action): 18) tim(begin(agt>thing, obj>action): 12. @past. @entry, year old: 1 S. @past) aoj(year old: 1 S. @past, Naguib Mahfouz(iof>person): 0 N. @topic) qua(year old: 1 S. @past, 17) plc(born(aoj>thing): 00, Cairo(iof>city): 08) aoj(born(aoj>thing): 00, Naguib Mahfouz(iof>person): 0 N. @topic) tim(born(aoj>thing): 00, 1911: 0 H) {/unl} [/S] ; ; Time 1. 4 Sec ; ; Done! {unl} and(write(agt>thing, obj>thing): 1 K. @past. @entry, publish(agt>thing, obj>thing): 0 K. @past) obj(write(agt>thing, obj>thing): 1 K. @past. @entry, novel(icl>tale): 1 B. @pl. @topic) tim(write(agt>thing, obj>thing): 1 K. @past. @entry, before(icl>how(obj>thing)): 1 S) aoj(more(icl>additional): 1 A, novel(icl>tale): 1 B. @pl. @topic) qua(novel(icl>tale): 1 B. @pl. @topic, 10: 16) [/S] 1. Enter query 2. Press to search Encyclopedia 3. Specify result language (Arab 4. View results here (Naguib Mahfouz). Click for more information. 5. A link to the UNL document
Conclusion
Conclusion • Independency of language is a very important dimension that should be considered in storing and retrieving texts for a UDL • The UNL system is a promising formalism for representing knowledge in a universal format. • The ISAUC less than 2 years old, however, it is one of the very active language centres in designing and implementing UNL materials and tools. • The UNL LIS has proved feasibility of the concept of language independency.
Thank You Any question is welcomed.
- Slides: 48