Contents 2 Web 1 0 Web 2 0

  • Slides: 100
Download presentation

Contents • • 2 Web 1. 0, Web 2. 0, Web 3. 0, Web

Contents • • 2 Web 1. 0, Web 2. 0, Web 3. 0, Web X. 0… Syntactic vs. Semantic Search Engines Spectrum of Semantic Search Problems? ! A. Frank

“The good, the bad and the …” 3 A. Frank

“The good, the bad and the …” 3 A. Frank

Semantics in Web 2. 0? ! 4 , Open Gardens blog, Ajit Jaokar A.

Semantics in Web 2. 0? ! 4 , Open Gardens blog, Ajit Jaokar A. Frank http: //opengardensblog. futuretext. com/archives/2005/12/mobile_Web_20_w. html

Web 1. 0, Web 2. 0, Web 3. 0, Web X. 0… 5 A.

Web 1. 0, Web 2. 0, Web 3. 0, Web X. 0… 5 A. Frank

Where are we headed? ! 6 A. Frank

Where are we headed? ! 6 A. Frank

Contents • • 7 Web 1. 0, Web 2. 0, Web 3. 0, Web

Contents • • 7 Web 1. 0, Web 2. 0, Web 3. 0, Web X. 0… Syntactic vs. Semantic Search Engines Spectrum of Semantic Search Problems? ! A. Frank

Googlism 8 A. Frank

Googlism 8 A. Frank

Google 9 • Presents us with a search box and allows us to enter

Google 9 • Presents us with a search box and allows us to enter free form queries. • Basically retrieves documents based on keyword searches. • The search algorithms are based on string/pattern matching, statistical frequencies and Page rank (based on hyperlink popularity analysis). • There is some light-weight semantics – when you search for known objects or public data, Google knows about these types of entities (One. Box) and provides factual answers. A. Frank

Actual Answers to Factual Questions 10 A. Frank

Actual Answers to Factual Questions 10 A. Frank

Find Public Data 11 A. Frank

Find Public Data 11 A. Frank

Compare Public Data 12 A. Frank

Compare Public Data 12 A. Frank

Current State • The current generation of search engines is severely limited in its

Current State • The current generation of search engines is severely limited in its understanding of the user's intent and the Web's content. • Semantic search has attracted a lot of attention in the past year, largely due to the growth of the Semantic Web as a whole. • The term “Semantic Search” itself is popular enough to be considered overused/overloaded. • But in general it refers to methods of searching documents beyond the syntactic level of matching keywords. A. Frank 13

Syntactic vs. Semantic Search • Syntactic search – can match the query against –

Syntactic vs. Semantic Search • Syntactic search – can match the query against – – index of the textual content of the resources URIs (URLs, URNs) in the system literals in the RDF metadata or a combination of these, possibly using: • Exact, prefix or substring match, stemming, minimal edit distance • Semantic search – in addition to syntactic search, can use – index of the meaning of sentences in each resource 14 – semantic knowledge structures and analysis – the graph structure of RDF metadata – or a combination of these, possibly using: • query expansion, classification/categorization, tagging, graph traversal, microformats, RDF & OWL inferencing and reasoning A. Frank

Can Semantic Search answer this : -? ) 15 A. Frank

Can Semantic Search answer this : -? ) 15 A. Frank

Semantics & NLP 16 • Semantics is a subfield of linguistics that is devoted

Semantics & NLP 16 • Semantics is a subfield of linguistics that is devoted to the study of meaning, as expressed by words, phrases, sentences, and even larger units of text. • Natural Language Processing (NLP) is a subfield of artificial intelligence and computational linguistics that studies the problems of automated generation and understanding of natural human languages by computers. • Semantic NLP integrates these to provide meaning and understanding within context of computer applications. A. Frank

Semantic Search in General • Collective methods that look at information available beyond the

Semantic Search in General • Collective methods that look at information available beyond the level of individual words or phrases. • Semantic search methods are diverse and cover a range of disciplines – Information Retrieval, Natural Language Processing, Semantic Analysis, the Semantic Web, Databases, Information Extraction, Information Visualization, etc. 17 • An understanding of content at the semantic level also makes it possible to aggregate, compare and reason with content as structured data. • Enables people to type in arbitrarily complex questions and then interpret these queries and execute them. A. Frank

Contents • • 18 Web 1. 0, Web 2. 0, Web 3. 0, Web

Contents • • 18 Web 1. 0, Web 2. 0, Web 3. 0, Web X. 0… Syntactic vs. Semantic Search Engines Spectrum of Semantic Search Problems? ! A. Frank

Types of “Semantic Search” Engines • Semantic Web Search Engines – search on the

Types of “Semantic Search” Engines • Semantic Web Search Engines – search on the Semantic Web data (RDF, OWL, etc). • Semantically-enhanced Search Engines – mostly augment and refine the displayed results. • NLP-based Search Engines – mainly work on the indexing and query side of the search. • Semantic-NLP-based Search Engines – employ semantic knowledge structures and analysis. • Computational-NLP-based Search Engines 19 – employ an inference engine that enables reasoning and computation. A. Frank

Sample of “Semantic Search” Engines • Semantic Web Search (not covered here) – Stooge,

Sample of “Semantic Search” Engines • Semantic Web Search (not covered here) – Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … • Semantically-enhanced Search (barely covered here) – Google Orion, Yahoo! Search. Monkey, (MS Kumo? ), … • NLP-based Search – Meta. Web Freebase, MS Powerset, … • Semantic-NLP-based Search – hakia, Cognition, (Readware? ), … • Computational-NLP-based Search – True Knowledge, (Wolfram Alpha? ), … 20 A. Frank

Sample queries used here • Query involves extracting concepts – Artificial Intelligence • Query

Sample queries used here • Query involves extracting concepts – Artificial Intelligence • Query involves ambiguous term – Jaguar, Cycle • Query involves aggregating data – What did Albert Einstein do • Query involves inference – When was California’s governor born • Query involves computation – What is the distance from San Francisco to Michigan 21 A. Frank

Results of sample queries used here Engines Freebase Powerset Queries 22 hakia Cognition True

Results of sample queries used here Engines Freebase Powerset Queries 22 hakia Cognition True Knowledge extracting concepts √ √ √ ambiguous term √ √ √ aggregating data √ √ √ ~ ~ inference X ~ ~ √ √ computation X X ~ X √ A. Frank

Types of “Semantic Search” Engines • Semantic Web Search (not covered here) – Stooge,

Types of “Semantic Search” Engines • Semantic Web Search (not covered here) – Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … • Semantically-enhanced Search (barely covered here) – Google Orion, Yahoo! Search. Monkey, (MS Kumo? ), … • NLP-based Search – Meta. Web Freebase, MS Powerset, … • Semantic-NLP-based Search – hakia, Cognition, (Readware? ), … • Computational-NLP-based Search – True Knowledge, (Wolfram Alpha? ), … 23 A. Frank

Google Orion • Technology that can better understand associations and concepts related to search.

Google Orion • Technology that can better understand associations and concepts related to search. • It finds relationships between queries and related concepts and presents those as search refinements: – Pages are scanned in “real-time” by Google after a query is entered. – Conceptually and contextually related sites/pages are then identified and expressed in the form of the improved refinements. • Also provides longer “snippets” (text extracts containing the keywords) when users input queries of three words or more. 24 A. Frank

Orion Search Refinements 25 A. Frank

Orion Search Refinements 25 A. Frank

Yahoo! Search. Monkey • The Search. Monkey platform is mainly based on the Semantic

Yahoo! Search. Monkey • The Search. Monkey platform is mainly based on the Semantic Web approaches. • Supports the ability to modify the presentation of search results through plug-ins to the search interface. • Content providers or third-party developers can create custom data services to extract information from the Web page. • After defining the extracted properties per entry, an enhanced display can be provided. 26 A. Frank

Search. Monkey Enhanced Display 27 A. Frank

Search. Monkey Enhanced Display 27 A. Frank

Types of “Semantic Search” Engines • Semantic Web Search (not covered here) – Stooge,

Types of “Semantic Search” Engines • Semantic Web Search (not covered here) – Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … • Semantically-enhanced Search (barely covered here) – Google Orion, Yahoo! Search. Monkey, (MS Kumo? ), … • NLP-based Search – Meta. Web Freebase, MS Powerset, … • Semantic-NLP-based Search – hakia, Cognition, (Readware? ), … • Computational-NLP-based Search – True Knowledge, (Wolfram Alpha? ), … 28 A. Frank

Meta. Web Freebase • Open, shared database of the world’s knowledge that collects data

Meta. Web Freebase • Open, shared database of the world’s knowledge that collects data from the Web to build a massive, collaboratively-edited database of cross-linked data. • Free for anyone to query, contribute to, build applications on top of, or integrate into their Web sites. • Focus is on organizing and managing complex data structures by use of Semantic Web technologies. • Enables extraction of ordered knowledge out of the information chaos that is the current Web. 29 A. Frank

Freebase Interface 30 A. Frank

Freebase Interface 30 A. Frank

Freebase Repository • Covers millions of topics in hundreds of categories. • Freebase contains

Freebase Repository • Covers millions of topics in hundreds of categories. • Freebase contains structured information on many popular topics, like movies, music, people and locations – all reconciled and freely available. • A base is a new kind of Website that anyone can build using Freebase; It is a place to organize and share collections of information about a particular subject. • Freebase has two search interfaces: – Free form queries; a text search box. – MQL (Meta. Web Query Language) queries. 31 A. Frank

Freebase Structure • Freebase spans domains, but requires that a particular topic exist only

Freebase Structure • Freebase spans domains, but requires that a particular topic exist only once, even if it might normally be found in multiple bases. • For example, Arnold Schwarzenegger would appear in a movie base as an actor, a political base as a governor and a bodybuilder base as a Mr. Universe. • In Freebase, there is only one topic for Arnold Schwarzenegger, with all three facets of his public persona brought together. 32 A. Frank

Freebase vs. Wikipedia • • The difference lies in the way they store information.

Freebase vs. Wikipedia • • The difference lies in the way they store information. Wikipedia arranges information in the form of articles. Freebase lists facts and statistics. Its list form is good not only for people who like to glance at facts, but also for people who want to use the data to build other Web sites/software. • Topics covered by Freebase include subjects that are too obscure for Wikipedia, which strives for notability appropriate to an encyclopedia. 33 A. Frank

Query involves extracting concepts 34 A. Frank

Query involves extracting concepts 34 A. Frank

Query involves ambiguous term 35 A. Frank

Query involves ambiguous term 35 A. Frank

Query involves aggregating data (1) 36 A. Frank

Query involves aggregating data (1) 36 A. Frank

Query involves aggregating data (2) 37 A. Frank

Query involves aggregating data (2) 37 A. Frank

Query involves inference 38 A. Frank

Query involves inference 38 A. Frank

Query involves computation 39 A. Frank

Query involves computation 39 A. Frank

Microsoft Powerset • Provides a NLP-based search engine. • Powerset reads and understands every

Microsoft Powerset • Provides a NLP-based search engine. • Powerset reads and understands every sentence on a Web page and allows asking questions in plain English. • Powerset search and discovery experience is currently based on Wikipedia and Freebase. • In large, Powerset is trying to match the meaning of a query with the meaning of sentences in Wikipedia or facts in Freebase. 40 A. Frank

Powerset Interface 41 A. Frank

Powerset Interface 41 A. Frank

Powerset Services • In the search box, you can express yourself in keywords, phrases,

Powerset Services • In the search box, you can express yourself in keywords, phrases, or simple questions. • On the search results page, Powerset gives more accurate results, often answering questions directly, and aggregating information from across multiple articles. • Powerset's technology follows you into enhanced Wikipedia articles, giving you a better way to quickly digest and navigate content. 42 A. Frank

Powerset Factz • Factz are concise representations of information extracted from sentences. • They

Powerset Factz • Factz are concise representations of information extracted from sentences. • They are represented in three parts: the subject, relation and object (e. g. , Oswald shot JFK). • Factz do not always represent truth, but rather propositions that are asserted in the text of Wikipedia. • On the search results page, you will often see Factz for general topic queries; these Factz are collected from pages across Wikipedia. • On a particular topic page, you can see the Factz extracted from the given page in the outline. 43 A. Frank

Snippet of a Powerset Result Page 44 A. Frank

Snippet of a Powerset Result Page 44 A. Frank

Query involves extracting concepts 45 A. Frank

Query involves extracting concepts 45 A. Frank

Query involves ambiguous term 46 A. Frank

Query involves ambiguous term 46 A. Frank

Query involves aggregating data 47 A. Frank

Query involves aggregating data 47 A. Frank

Query involves inference 48 A. Frank

Query involves inference 48 A. Frank

Query involves computation 49 A. Frank

Query involves computation 49 A. Frank

Types of “Semantic Search” Engines • Semantic Web Search (not covered here) – Stooge,

Types of “Semantic Search” Engines • Semantic Web Search (not covered here) – Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … • Semantically-enhanced Search (barely covered here) – Google Orion, Yahoo! Search. Monkey, (MS Kumo? ), … • NLP-based Search – Meta. Web Freebase, MS Powerset, … • Semantic-NLP-based Search – hakia, Cognition, (Readware? ), … • Computational-NLP-based Search – True Knowledge, (Wolfram Alpha? ), … 50 A. Frank

hakia • • 51 Brings relevant results based on concept match rather than keyword

hakia • • 51 Brings relevant results based on concept match rather than keyword match or popularity ranking. hakia has constructed three major components: 1. QDEX (Query Detection and Extraction) is an advanced system that enables semantic analysis of Web pages, and "meaning-based" search. 2. Semantic. Rank algorithm is comprised of innovative solutions from the disciplines of Ontological Semantics, Fuzzy Logic, Computational Linguistics, and Mathematics. 3. Onto. Sem (Ontological Semantics) is based on a formal and comprehensive linguistic theory of meaning in natural language. A. Frank

hakia Interface 52 A. Frank

hakia Interface 52 A. Frank

System hakia QDEX 53 A. Frank

System hakia QDEX 53 A. Frank

hakia QDEX • The QDEX algorithm – analyzes the entire content of a Web

hakia QDEX • The QDEX algorithm – analyzes the entire content of a Web page (including HTML). – extracts all possible queries that can be asked to this content, at various lengths and forms. – queries (called sequences) become gateways to the originating documents, paragraphs and sentences during the retrieval mode. – All this is done off-line before any actual query is received from a user. • The critical point in QDEX system is to be able to decompose sentences into a handful of meaningful sequences without getting lost in the combinatory explosion space (hakia uses Onto. Sem technology to meet this challenge). • Decomposing content in this way provides great flexibility in a search engine platform for utilizing semantically rich data and multiple-thread processing of equivalent sequences. 54 A. Frank

Semantic. Rank Algorithm 55 A. Frank

Semantic. Rank Algorithm 55 A. Frank

hakia Semantic. Rank • Ranks search results in the order relevancy: – A pool

hakia Semantic. Rank • Ranks search results in the order relevancy: – A pool of relevant paragraphs come from the QDEX system for terms of a given query. – The final relevancy is determined based on advanced sentence analysis and concept match between the query and the best sentence of each paragraph. – Morphological and syntactic analyses are also performed. • Among the criteria for ranking, the credibility and age (of the Web page) are also taken into account via proprietary measurements. • Popularity of keywords by means of link referrals is not used, which eliminates the possibilities of – meaningless results ranking high. – manipulation of results by fictitious page design. 56 A. Frank

hakia Onto. Sem • A formal and comprehensive linguistic theory of meaning in natural

hakia Onto. Sem • A formal and comprehensive linguistic theory of meaning in natural language. • Onto. Sem offers an advanced methodology and technology for natural language processing. • Has set of well-developed and constantly improving resources: – language-independent ontology of thousands of interrelated concepts. – an ontology-based English lexicon of 100, 000 word senses, and counting (plus, the lexicons for several other languages under construction). – an ontological parser which "translates" every sentence of the text into its text meaning representation, approximating the complete understanding of the sentence by the native speaker. 57 A. Frank

Query involves extracting concepts 58 A. Frank

Query involves extracting concepts 58 A. Frank

Query involves ambiguous term 59 A. Frank

Query involves ambiguous term 59 A. Frank

Query involves aggregating data 60 A. Frank

Query involves aggregating data 60 A. Frank

Query involves inference 61 A. Frank

Query involves inference 61 A. Frank

Query involves computation 62 A. Frank

Query involves computation 62 A. Frank

Cognition Technologies • Semantic NLP technology employs a unique mix of linguistics and mathematical

Cognition Technologies • Semantic NLP technology employs a unique mix of linguistics and mathematical algorithms – has, in effect, taught the computer the meanings (or associated concepts) of, and relations between, nearly all the words and frequently used phrases within the common English language. • Focuses on the understanding of word and phrase meanings within context – provides an application or end-user with actionable content based upon semantic knowledge. • Is able to simultaneously deliver significantly higher levels of precision and recall than is usually possible. 63 A. Frank

Cognition Interface 64 A. Frank

Cognition Interface 64 A. Frank

Cognition’s Semantic Map 65 • Cognition's Semantic NLP enabling technology contains one of the

Cognition’s Semantic Map 65 • Cognition's Semantic NLP enabling technology contains one of the world's largest computational dictionaries, also known as a Semantic Map. • This Semantic Map encodes a wealth of morphological, syntactic and semantic information about the words of the English language and their relationships to each other. • These resources were created and reviewed by lexicographers and linguists over a span of 24 years. A. Frank

How Cognition Is Different • Cognition has changed the NLP paradigm through its unique

How Cognition Is Different • Cognition has changed the NLP paradigm through its unique and complete combination of a complete semantic map and linguistic elements to optimize semantic understanding: – Morphology • The various forms of word, e. g. singular, plural, tense – Syntax • The grammatical structure, e. g. verbs, nouns – Semantics • Word and sentence meaning, augmented by synonymy and taxonomy – Spelling • The various ways words are spelled (or misspelled) 66 A. Frank

Wikipedia. Cognition Interface 67 A. Frank

Wikipedia. Cognition Interface 67 A. Frank

Query involves extracting concepts 68 A. Frank

Query involves extracting concepts 68 A. Frank

Query involves ambiguous term 69 A. Frank

Query involves ambiguous term 69 A. Frank

Query involves aggregating data 70 A. Frank

Query involves aggregating data 70 A. Frank

Query involves inference (1) 71 A. Frank

Query involves inference (1) 71 A. Frank

Query involves inference (2) 72 A. Frank

Query involves inference (2) 72 A. Frank

Query involves computation 73 A. Frank

Query involves computation 73 A. Frank

Types of “Semantic Search” Engines • Semantic Web Search (not covered here) – Stooge,

Types of “Semantic Search” Engines • Semantic Web Search (not covered here) – Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … • Semantically-enhanced Search (barely covered here) – Google Orion, Yahoo! Search. Monkey, (MS Kumo? ), … • NLP-based Search – Meta. Web Freebase, MS Powerset, … • Semantic-NLP-based Search – hakia, Cognition, (Readware? ), … • Computational-NLP-based Search – True Knowledge, (Wolfram Alpha? ), … 74 A. Frank

True Knowledge 75 • Provides an Answer Engine aimed at dramatically improving the experience

True Knowledge 75 • Provides an Answer Engine aimed at dramatically improving the experience of finding known facts on the Web. • Represents the world's knowledge in a form that is clear and accessible to humans, as well as being comprehensible to computers. • Combines knowledge through a process of inference, drawing conclusions, and crossreferencing stored information to produce a reasoned answer. A. Frank

True Knowledge Interface 76 A. Frank

True Knowledge Interface 76 A. Frank

Ways of looking at True Knowledge 1. A Q/A (Question-Answering) Site 2. The Answer

Ways of looking at True Knowledge 1. A Q/A (Question-Answering) Site 2. The Answer Engine – The Perfect Complement to Search 3. A "Wikipedia for Facts" 4. A Universal Knowledge Base 5. A Platform for Building Knowledge Services 77 A. Frank

True Knowledge Components • Knowledge Base (KB) – a huge database of facts on

True Knowledge Components • Knowledge Base (KB) – a huge database of facts on any topic. • Knowledge Generator – infers facts either using KB, other generated facts or external feeds of knowledge. • Browser interface – supports NLP & query language. • Query/Answer System – uses KB and generated facts to answer queries. • System Assessment – further processes existing facts in order to maintain semantic consistency of KB. • User Assessment – enables user to endorse or contradict particular facts in KB. 78 A. Frank

Query involves extracting concepts 79 A. Frank

Query involves extracting concepts 79 A. Frank

Query involves ambiguous term 80 A. Frank

Query involves ambiguous term 80 A. Frank

Query involves aggregating data (1) 81 A. Frank

Query involves aggregating data (1) 81 A. Frank

Query involves aggregating data (2) 82 A. Frank

Query involves aggregating data (2) 82 A. Frank

Query involves aggregating data (3) 83 A. Frank

Query involves aggregating data (3) 83 A. Frank

Query involves inference 84 A. Frank

Query involves inference 84 A. Frank

Query involves computation (1) 85 A. Frank

Query involves computation (1) 85 A. Frank

Query involves computation (2) 86 A. Frank

Query involves computation (2) 86 A. Frank

Contents • • 87 Web 1. 0, Web 2. 0, Web 3. 0, Web

Contents • • 87 Web 1. 0, Web 2. 0, Web 3. 0, Web X. 0… Syntactic vs. Semantic Search Engines Spectrum of Semantic Search Problems? ! A. Frank

Alex Iskold Spectrum 88 A. Frank

Alex Iskold Spectrum 88 A. Frank

Ken Ewell Debuking the Spectrum 89 Ken Ewell for Semiotica A. Frank

Ken Ewell Debuking the Spectrum 89 Ken Ewell for Semiotica A. Frank

Google “goes religious” 90 A. Frank

Google “goes religious” 90 A. Frank

Freebase “what’s” it 91 A. Frank

Freebase “what’s” it 91 A. Frank

… but knows what vocation is! 92 A. Frank

… but knows what vocation is! 92 A. Frank

Power. Set “does his best” for “me now” 93 A. Frank

Power. Set “does his best” for “me now” 93 A. Frank

hakia mimics Google 94 A. Frank

hakia mimics Google 94 A. Frank

or wants to “end the occupation now” 95 A. Frank

or wants to “end the occupation now” 95 A. Frank

Cognition mixes it up 96 A. Frank

Cognition mixes it up 96 A. Frank

True Knowledge seems intelligent 97 A. Frank

True Knowledge seems intelligent 97 A. Frank

… but isn’t really 98 A. Frank

… but isn’t really 98 A. Frank

Computational-Semantic-NLP-based Search : -? ) 99 A. Frank

Computational-Semantic-NLP-based Search : -? ) 99 A. Frank

Sources • P. Mika, Semantic Search Arrives at the Web, Dev. X, July 18,

Sources • P. Mika, Semantic Search Arrives at the Web, Dev. X, July 18, 2008, http: //www. devx. com/semantic/Article/38595/0/page/1 • A. Iskold, Semantic Search: The Myth and Reality, Read. Write. Web, May 29, 2008, http: //www. readwrite. Web. com/archives/semantic_search_the_ myth_and_reality. php • K. Ewell, The Search for Semantic Search, Semiotica, June 18, 2008, http: //commonsensical. wordpress. com/2008/06/18/thesearch-for-semantic-search/ • N. Karandikar, Powerset vs. Cognition: A Semantic Search Shoot-out, Gigaom, June 7, 2008, http: //gigaom. com/2008/06/07/powerset-vs-cognition-a 10 semantic-search-shoot-out/ 0 A. Frank