A Semantic WebBased Approach for Personalizing News Flavius
A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese. eur. nl Erasmus University Rotterdam * Joint work with Kim Schouten, Philip Ruijgrok, Jethro Borsje, Leonard Levering, and Frederik Hogenboom 1
Contents • Motivation • Hermes Framework: 1. 2. 3. 4. • News Classification Knowledge Base Updating News Querying Results Presentation Hermes News Portal: – An example • Evaluation • Conclusions • Future Work 2
Motivation • Large quantity of news on the Web: – Difficult to find the ones of interest • News messages have a strong impact on stock prices • Limited annotation of RSS feeds: – Broad categories (business, cars, entertainment, etc. ) • Google finance shows direct news which pertain to a certain portfolio: – Indirect news (competitors of Google like Microsoft) are not presented – Not possible to ask time-related queries about news 3
Motivation • Need for an intelligent system to personalize news • The world is changing: – It is important to have an up-to-date representation of the world into the system • News have a dual function: – Find the information of interest – Update our previous knowledge on the state of the world • Feedback loop: – The extracted information helps in the next iteration to refine your domain of interest 4
Hermes Framework • Input: – News items from RSS feeds – Domain ontology linked to a semantic lexicon (e. g. , Word. Net) – User query • Output: – News items as answers to the user query • Four phases: 1. News Classification • Relate news items to ontology concepts 2. Knowledge Base Updating • Update the knowledge base with news information 3. News Querying • Allow the user to express his concepts of interest and the temporal constraints 4. Results Presentation • Present the news items that match user’s query 5
Hermes Architecture 6
1. News Classification • Concept defined in the ontology (class or individual) • Multiple lexical representations for the same concept: – Ontology synonyms (e. g. , New York → “New York”, “Big Apple”) – Semantic lexicon synonyms (e. g. , buy → “acquire”) • Concepts without subclasses or instances: – Semantic lexicon hyponyms (e. g. , company → dot-com) • Lookup ontology concepts into news items • A longer match supersedes a shorter match (“European Central Bank” supersedes “European”) 7
1. News Classification 1. 1 Tokenization (words, punctuation signs) 1. 2 Sentence splitting (sentences) 1. 3 Part-of-speech tagging (e. g. , noun, verb, adj. , etc. ) 1. 4 Morphological analysis (e. g. , lemma “read” for “reading” as a verb) 1. 5 Word sense disambiguation (e. g. , Structural Semantic Interconnection (SSI) based on word context) 1. 6 Adding “hits” between news items and the domain ontology 8
2. Knowledge Base Updating • Knowledge base updates are based on recognized events • Events have associated rules with: – Alternative patterns for event detection – Sequence of actions for knowledge base update • Before knowledge base updating the discovered events need to be validated by the user • E. g. , an event is kb: new. CEO which represents the appointment of a new CEO 9
2. Knowledge Base Updating 2. 1 Event Rules Patterns Construction – Make use of lexico-semantic patterns – Lexico-Semantic patterns are based on triples (Subject, Predicate, Object) where • [type] stands for knowledge instances of the enclosed types E. g. , [kb: Company] represents all company instances (all their associated lexical representations): – “IBM”, “International Business Machines”, etc. – “EBay”, “E-bay”, “Ebay”, etc. – Etc. • Otherwise they represent knowledge base instances • $name represent variables 10
2. Knowledge Base Updating 2. 1 Event Rules Patterns Construction – Two types of patterns: • SP patterns: E. g. , $c: [kb: Company] kb: Goes. Bankrupt matches “World. Com goes bankrupt”, “World. Com filed for Chapter 11”, etc. • SPO patterns: E. g. , $p: [kb: Person] kb: Becomes. CEO $c: [kb: Company] matches “Steve Ballmer appointed CEO of Microsoft”, “Steve Ballmer becomes new Chief Executive Officer of Microsoft”, etc. 11
2. Knowledge Base Updating 2. 2 Event Rules Patterns Execution – Extract information from text – Assign ontology concepts to the variables 2. 3 Event Validation – The knowledge extraction process is not flawless – User validates the extracted knowledge 2. 4 Event Rules Actions Construction – Two types of actions: • Insert triples E. g. , INSERT $c kb: has. CEO $p • Delete triples E. g. , DELETE $c kb: has. CEO $p 12
2. Knowledge Base Updating 2. 4 Event Rules Actions Construction (Cont’d) – Per event a sequence of actions is defined – The order of actions is important E. g. , for the event kb: new. CEO two actions are defined: DELETE $c kb: has. CEO $pp INSERT $c kb: has. CEO $p – Unbound variables stand for anything and are not allowed in INSERT actions (e. g. , $pp in the example) 2. 5 Event Rules Actions Execution – Execute the actions associated to events in the order they are found in the news – Per event execute in the given order the associated actions 13
3. News Querying 3. 1 Query Formulation – Present the domain knowledge as directed labeled multi-graph with the additional constraint that arcs between two nodes are not allowed to share the same label (called conceptual graph) – User selects the concepts of interest in the conceptual graph (e. g. , Google) – User is able to add to its selection concepts related to the concepts of interests using specified relations (e. g. , kb: has. Competitors: kb: Microsoft, kb: e. Bay, and kb: Yahoo) – The selected concepts are presented in a separate graph (called search graph) 14
3. News Querying 3. 1 Query Formulation (Cont’d) – News are time stamped – User is able to specify that only news in a certain time interval should be retrieved – Time constraints: • • Last hour Last day Last year [2007 -03 -01 T 00: 00. 000+00: 01, 2007 -05 -31 T 00: 00. 000+00: 01 ] – [Future: order constraints (e. g. , order by time)] 15
3. News Querying 3. 2 Query Execution – Generate the query in a semantic query language: • Map concepts of interest to query restrictions (current: disjunctive queries) • Map temporal constraints to query restrictions – Execute the semantic query • The order of the relevant news items is not important here 16
4. Results Presentation 4. 1 News Sorting • Return news items that match a query • Sort the news items based on their relevance degree to the query • The relevance degree is determined empirically: – based on a weighted sum of the number of hits in title (higher weight) and body (lower weight) of the news item • News items that have the same relevance degree are sorted in descending timestamp order 17
4. Results Presentation 4. 2 News Presentation • Present the concepts involved in the query • Per each news items show a summary: – – Title Source Date Few beginning lines from the news item ([Future: snippet]) • Emphasize the hits (found concepts from the ontology) in the retrieved news items • Show the icons of the most important query concepts found in a news item: – based on a weighted sum of the number of hits in title (higher weight) and body (lower weight) of a concept in a news item 18
Hermes News Portal • Hermes News Portal (HNP) is an implementation of the Hermes framework • Implementation language: Java • Ontology represention language: OWL (e. g. , cardinality restrictions, inverses, etc. ) • Semantic lexicon: Word. Net • Graph visualization: Prefuse (OWL 2 Prefuse) • Query language: SPARQL/Update • Query language: SPARQL extended with custom time functions (e. g. , current. Date(), current. Time(), etc. ) • Natural language processing: GATE 19
An Example • Query: Which are the news items about Google or one of its competitors from the past six months? 20
1. News Classification – Import News 21
1. News Classification – Conceptual Graph 22
1. News Classification – News Item “SAN FRANCISCO (Reuters) -Web search leader Google Inc. on Monday said it agreed to acquire top video entertainment site You. Tube Inc. for $1. 65 billion in stock, putting a lofty new value on consumer-generated media sites. ” [October 9 th, 2006 at 20: 15: 33 CET] • Three concepts are founded in the news: – kb: Google – kb: Buy – kb: You. Tube • kb: Relation class instances store hits between the news item and the found concepts (Semantic Web best practice recommendation for modeling N-ary relationships) 23
2. Knowledge Base Updating – Rule Editor (Define Event Rule Patterns) 24
2. Knowledge Base Updating – Rule Editor (Select Concepts) 25
2. Knowledge Base Updating – Rule Editor (Define Relations) 26
2. Knowledge Base Updating – Rule Editor (Event Validation) 27
2. Knowledge Base Updating – Rule Editor (Define Event Rule Actions) 28
3. News Querying- Search Graph Individuals Classes Selected concepts Concepts related to the selected node Concepts from keyword search 29
3. News Querying - Search Graph 30
3. News Querying- SPARQL • SPARQL query: PREFIX hermes: <http: //hermes-news. org/news. owl#> SELECT ? title WHERE { ? news hermes: title ? title. ? news hermes: time ? date. ? news hermes: relation ? relation. ? news hermes: related. To ? concept. FILTER ( ? concept hermes: related. To hermes: Google || ? concept hermes: related. To hermes: Micosoft || ? concept hermes: related. To hermes: Ebay || ? concept hermes: related. To hermes: Yahoo ) FILTER ( ? date > "2009 -02 -01 T 00: 00. 000+00: 01" && ? date < "2009 -07 -31 T 00: 00. 000+00: 01" ) } 31
3. News Querying- t. SPARQL • Custom time functions: Function name Output type current. Date() xsd: date current. Time() xsd: time now() xsd: date. Time-add(xsd: date. Time A, xsd: duration B) xsd: date. Time-substract(xsd: date. Time A, xsd: duration B) xsd: date. Time 32
3. News Querying- t. SPARQL • t. SPARQL query: PREFIX hermes: <http: //hermes-news. org/news. owl#> SELECT ? title WHERE { ? news hermes: title ? title. ? news hermes: time ? date. ? news hermes: relation ? relation. ? news hermes: related. To ? concept. FILTER ( ? concept hermes: related. To hermes: Google || ? concept hermes: related. To hermes: Micosoft || ? concept hermes: related. To hermes: Ebay || ? concept hermes: related. To hermes: Yahoo ) FILTER ( ? date > hermes: date. Time-substract(hermes: now(), P 0 Y 6 M) && ? date < hermes: now() ) } 33
4. Results Presentation 34
Evaluation • Test set: 200 new items from Yahoo! business and technology news feed • Precision for concept identification: 86% • Recall for concept identification: 81% • Precision for event identification 62% • Recall for event identification 53% • Subsecond performance for one news item processing time 35
Evaluation • Test users: 9 students following a Semantic Web course • Usability: build one query (the one from the presentation) using HNP and in SPARQL • Quantitative evaluation: – Measure the time it takes to build the query in the two approaches – Faster to build the news query using HNP • Qualitative evaluation: – Questionnaire – Easier to build the news query using HNP – HNP pros: graphical user interface, predefined time functionality, results explanation by highlighting the found concepts – HNP cons: the layout changes from conceptual graph to search graph, results are not ordered by time 36
Conclusions • Hermes Framework: presents news items that match the user interests • Hermes Framework: – – News Classification Knowledge Base Updating News Querying Results Presentation • Hermes News Portal (HNP): an implementation of the Hermes framework • HNP based on: – Word. Net semantic lexicon, OWL ontology, (extended) SPARQL queries, Prefuse visualization, GATE natural language processing 37
Future Work • Limited query expressivity: – Add conjunction to queries (e. g. , retrieve all news items that mention both Google and Yahoo!) – Add negation to queries (e. g. , retrieve all news items that do not mention Google) – Add patterns to queries (e. g. , retrieve all news items that refer to Google acquiring another company) • Add snippets and temporal ordering to query results presentation • Evaluate the tool outside the university lab • Evaluate the tool for another domain (e. g. , politics instead of finance) 38
- Slides: 38