Language Technologies and the Semantic Web An Essential
Language Technologies and the Semantic Web: An Essential Relationship. Enrico Motta Professor of Knowledge Technologies Knowledge Media Institute The Open University
Content of the Talk • Update on the Semantic Web – Beyond the hype • What it is • Why it is interesting • What’s its status? • Semantic Web and AI • Semantic Web Applications – Key features – Reasoning on the Semantic Web – Key role of Language Technologies • Conclusions
The Semantic Web in 2 minutes…
<foaf: Person rdf: about="http: //identifiers. kmi. open. ac. uk/people/enrico-motta/"> <foaf: name>Enrico Motta</foaf: name> <foaf: first. Name>Enrico</foaf: first. Name> <foaf: surname>Motta</foaf: surname> <foaf: phone rdf: resource="tel: +44 -(0)1908 -653506"/> <foaf: homepage rdf: resource="http: //kmi. open. ac. uk/people/motta/"/> <foaf: workplace. Homepage rdf: resource="http: //kmi. open. ac. uk/"/> <foaf: depiction rdf: resource="http: //kmi. open. ac. uk/img/members/enrico. jpg"/> <foaf: topic_interest>Knowledge Technologies</foaf: topic_interest> <foaf: topic_interest>Semantic Web</foaf: topic_interest> <foaf: topic_interest>Ontologies</foaf: topic_interest> <foaf: topic_interest>Problem Solving Methods</foaf: topic_interest> <foaf: topic_interest>Knowledge Modelling</foaf: topic_interest> <foaf: topic_interest>Knowledge Management</foaf: topic_interest> <foaf: based_near> <geo: Point> <geo: lat>52. 024868</geo: lat> <geo: long>-0. 707143</geo: long> <contact: nearest. Airport> <airport: name>London Luton Airport</airport: name> <airport: iata. Code>LTN</airport: iata. Code> <airport: location>Luton, United Kingdom</airport: location> <geo: lat>51. 8666667</geo: lat> <geo: long>-0. 36666667</geo: long> <rdfs: see. Also rdf: resource="http: //www. daml. org/cgi-bin/airport? LTN"/> <foaf: current. Project> <foaf: name>Aqua. Log</foaf: name>
The foaf ontology
The SW as ‘Web of Data’
Current status of the semantic web • 10 -20 million semantic web documents – Expressed in RDF, OWL, DAML+OIL • 7 K-10 K ontologies – These cover a variety of domains - multimedia, computing, management, bio-medical sciences, geography, entertainment, upper level concepts, etc… The above figures refer to resources which are publicly accessible on the web
The Semantic Web today • To a significant extent the Semantic Web is already in place and is characterized by a widespread production of formalized knowledge models (ontologies and metadata), from a variety of different groups and individuals – “The Next Knowledge Medium - An information network with semiautomated services for the generation, distribution, and consumption of knowledge” • Stefik, 1986 – “Knowledge modelling to become a new form of literacy? ” • Stutt and Motta, 1997 • Still primarily a research enterprise, however interest is rapidly increasing in both governmental and business organizations • “early adopters” phase • The result is slowly emerging as an unprecedented knowledge resource, which can enable a new generation of intelligent applications on the web
Semantic Web Applications What can you do with the Semantic Web?
“Corporate Semantic Webs” • A ‘corporate ontology’ is used to provide a homogeneous view over heterogeneous data sources • Often tackle Enterprise Information Integration scenarios • Hailed by Gartner as one of the key emerging strategic technology trends – E. g. , see personal information management in Garlik
Exploiting large scale semantics Next Generation SW Applications Semantic Web
Exploiting large scale semantics Next Generation SW Applications Semantic Web
NGSW Applications in the context of AI research
Knowledge-Based Systems Large Body of Knowledge “Today there has been a shift in paradigm. The fundamental problem of understanding intelligence is not the identification of a few powerful techniques, but rather the question of how to represent large amounts of knowledge in a fashion that permits their effective use” Goldstein and Papert, 1977 Intelligent Behaviour
The Knowledge Acquisition Bottleneck Knowledge Large Body of Knowledge KA Bottleneck Intelligent Behaviour
SW as Enabler of Intelligent Behaviour Both a platform for knowledge publishing and a large scale source of knowledge Intelligent Behaviour
KBS vs SW Systems Classic KBS SW Systems Provenance Centralized Distributed Size Small/Medium Extra Huge Repr. Schema Homogeneous Heterogeneous Quality High Very Variable Degree of trust High Very Variable
Key Paradigm Shift Intelligent Behaviour Classic KBS SW Systems A function of sophisticated, logical, taskcentric problem solving A side-effect of being able to integrate different types of reasoning to handle size and heterogeneous quality and representation
Next Generation SW Applications: Examples Case Study 1: Automatic Alignment of Thesauri in the Agricultural/Fishery Domain
Method Access Semantic Web Scarlet Deduce Concept_A (e. g. , Supermarket) Semantic Relation ( ) Concept_B (e. g. , Building) - SCARLET - matching by Harvesting the SW - Automatically select and combine multiple online ontologies to derive a relation
Two strategies Building Organic. Chemical Public. Building Shop Lipid Steroid Supermarket Cholesterol Semantic Web Scarlet Supermarket Building Scarlet Cholesterol Organic. Chemical (A) (B) Deriving relations from (A) one ontology and (B) across ontologies.
Experiment Matching: • AGROVOC • UN’s Food and Agriculture Organisation (FAO) thesaurus • 28. 174 descriptor terms • 10. 028 non-descriptor terms • NALT • US National Agricultural Library Thesaurus • 41. 577 descriptor terms • 24. 525 non-descriptor terms
226 Used Ontologies http: //139. 91. 183. 30: 9090/RDF/VRP/Examples/tap. rdf http: //reliant. teknowledge. com/DAML/SUMO. daml http: //reliant. teknowledge. com/DAML/Mid-level-ontology. daml http: //gate. ac. uk/projects/ htechsight/Technologies. daml http: //reliant. teknowledge. com/DAML/Economy. daml
Evaluation 1 - Precision • Manual assessment of 1000 mappings (15%) • Evaluators: – Researchers in the area of the Semantic Web – 6 people split in two groups • Results: – Comparable to best results for background knowledge based matchers.
Evaluation 2 – Error Analysis
Other Case Studies…
Giving meaning to tags
Example Cluster_1: {college commerce corporate course education high instructing learning lms school student} activities 4 learning 4 teaching 4 education training 1, 4 qualification school 2 corporate 1 institution post. Secondary School 2 student 3 studies. At takes. Course university 2, 3 offers. Course course 3 1 http: //gate. ac. uk/projects/htechsight/Employment. daml. 2 http: //reliant. teknowledge. com/DAML/Mid-level-ontology. daml. 3 http: //www. mondeca. com/owl/moses/ita. owl. 4 http: //www. cs. utexas. edu/users/mfkb/RKF/tree/CLib-core-office. owl. college 2
Conclusions
Typical misconceptions… • “The SW is a long-term vision…” – Ehm…actually… it already exists… • “The SW will never work because nobody is going to annotate their web pages” – The SW is not about annotating web pages, the SW is a web of data, most of which are generated from DBs, or from web mining software, or from applications which produce SW data as a side effect of supporting users’ tasks • “The idea of a universal ontology has failed before and will fail again. Hence the SW is doomed” – The SW is not about a single universal ontology. Already there around 10 K ontologies and the number is growing… – SW applications may use 1, 2, 3, or even hundreds of ontologies.
SW and Language Technologies • All the applications mentioned here combine language, web, statistical and semantic technologies • Heterogeneity and sloppy modelling implies that language and statistical technologies are almost always needed when building NGSW apps • In contrast with traditional KBS, intelligent behaviour is more a side-effect of intg. multiple techniques to handle scale and heterogeneity, rather than a function of powerful deductive reasoning
- Slides: 38