Research Internships Advanced Research and Modeling Research Group
- Slides: 40
Research Internships Advanced Research and Modeling Research Group
ADREM – What? • Research group that deals with computational aspects of data – databases – data mining – Information retrieval
ADREM – Who? DB/DM/IR • Floris Geerts • Bart Goethals • Martin Theobald Bioinf • Kris Laukens • Tim Van den Bulcke + Phd students and postdoctoral researchers http: //adrem. ua. ac. be/adrem
Internships – What? • 2 research internships (15 credits each) • Msc thesis (30 credits). Goal: internships are an initiation to research and is in collaboration with researchers in ADRe. M • 15 credits is a lot = internship is time consuming! • 1 credit = 15 hour work… • Balance your course load and internship well. • Internships are not necessarily related to your Msc thesis (but it can) • In a Msc thesis your ability to independently do research plays an important role.
Internships – Who? • Everyone who follows the research option in the database Msc program
Research In an internship you need to: 1. Understand a specific problem 2. Implement an (existing) method for solving the problem 3. Test and evaluate 4. Write a report (Msc thesis: you have to solve the problem as well by designing new methods…)
Internships in a company • It is allowed to do a internship in a company but you have to ask permission • Also, you have to find the company yourself and convince us that there is research involved • You can’t receive any money from the company during your internship
Databases, data mining, information retrieval • These are not separate research domains • The topics for internships that each of us will present next are usually on the intersection of these areas. • Let’s see some example topics….
Bart Goethals
Recommender Systems • • • Implement state of the art recommenders Pattern mining for better recommendations Interactive Recommendation Explaining recommendations Test recommenders for real data
Visual Instant Interactive Pattern Mining • Study Visualizations enabling Interactive Pattern Mining • Implement and Experiment with novel instant mining methods
Pattern based Clustering • Implement and evaluate different techniques for clustering based pattern mining, and pattern based clustering
Data Mining for Cleaning • Study and experiment with data mining methods for data cleaning.
Martin Theobald
Information Extraction (I): Wikipedia Infoboxes
Information Extraction (I): Infoboxes YAGO/DBpedia et al. born. On(Jeff, 09/22/42) grad. From(Jeff, Columbia) has. Advisor(Jeff, Arthur) has. Advisor(Surajit, Jeff) known. For(Jeff, Theory) >120 M facts for YAGO 2 (mostly from Wikipedia infoboxes)
Information Extraction (II): Wikipedia Categories
Information Extraction (II): Wikipedia Categories ?
RDF Knowledge Bases 3 Mio. entities, 120 Mio. facts 100 relations, 200 k classes Entity subclass Organization subclass Person Scientist subclass Biologist subclass Politician instance. Of Oct 23, 1944 instance. Of Max_Planck Society Oct 4, 1947 Apr 23, 1858 Erwin_Planck Kiel has. Won Father. Of Germany located. In born. In Schleswig. Holstein citizen. Of died. On Max_Planck born. On means instance. Of died. On Nobel Prize Country State City instance. Of subclass instance. Of Physicist accuracy 95% Location subclass “Max Planck” http: //www. mpi-inf. mpg. de/yago-naga/ means “Max Karl Ernst Ludwig Planck” Angela Merkel means “Angela Merkel” means “Angela Dorothea Merkel”
Linked Open Data As of Sept. 2011: > 200 sources > 30 billion RDF triples > 400 million links http: //linkeddata. org/
As of Sept. 2011: > 5 million owl: same. As links between DBpedia/YAGO/Freebase
IBM Watson: Deep Question Answering William Wilkinson's "An Account of the Principalities of Wallachia and Moldavia" inspired this author's most famous novel This town is known as "Sin City" & its downtown is "Glitter Gulch" As of 2010, this is the only former Yugoslav republic in the EU 99 cents got me a 4 -pack of Ytterlig coasters from this Swedish chain question classification & decomposition knowledge back-ends D. Ferrucci et al. : Building Watson: An Overview of the Deep. QA Project. AI Magazine, Fall 2010. YAGO www. ibm. com/innovation/us/watson/index. htm
Jeopardy! A big US city with two airports, one named after a World War II hero, and one named after a World War II battle field?
Structured Knowledge Queries A big US city with two airports, one named after a World War II hero, and one named after a World War II battle field? Select Distinct ? c Where { ? c type City. ? c located. In USA. ? a 1 type Airport. ? a 2 type Airport. ? a 1 located. In ? c. ? a 2 located. In ? c. ? a 1 named. After ? p type War. Hero. ? a 2 named. After ? b type Battle. Field. } • Use manually created templates for mapping sentence patterns to structured queries. • Works for factoid and list questions.
Mining Rules from RDF Knowledge Bases Goal: Inductively learn (soft) rules: lives. In(x, y) : - born. In(x, y) R Ground truth for lives. In (only partially known) Knowledge base for lives. In (known positive examples) Facts produced by the rule (only partially correct) KB G • A-priori-style pre-filtering of low-support join patterns • Dynamic programming ILP algorithm • Learning with constants and type constraints
Rule-based Reasoning (Soft) Deduction Rules vs. (Hard) Consistency Constraints • People may live in more than one place lives. In(x, y) married. To(x, z) lives. In(z, y)[0. 8] lives. In(x, y) has. Child(x, z) lives. In(z, y)[0. 5] • People are not born in different places/on different dates born. In(x, y) born. In(x, z) y=z • People are not married to more than one person (at the same time, in most countries? ) married. To(x, y, t 1) married. To(x, z, t 2) y≠z disjoint(t 1, t 2)
Probabilistic RDF Database Query graduated. From(Surajit, y) 0. 7 x(1 -0. 888)=0. 078 graduated. From (Surajit, Q 1 Princeton) (1 -0. 7)x 0. 888=0. 266 graduated. From (Surajit, Q 2 Stanford) A (B (C D)) 1 -(1 -0. 72)x(1 -0. 6) =0. 888 / A graduated. From 0. 8 x 0. 9 =0. 72 (Surajit, Princeton)[0. 7] C has. Advisor (Surajit, Jeff)[0. 8] / D B graduated. From (Surajit, Stanford)[0. 6] works. At (Jeff, Stanford)[0. 9] Rules has. Advisor(x, y) works. At(y, z) graduated. From(x, z) [0. 4] graduated. From(x, y) graduated. From(x, z) y=z Base Facts graduated. From(Surajit, Princeton) [0. 7] graduated. From(Surajit, Stanford) [0. 6] graduated. From(David, Princeton) [0. 9] has. Advisor(Surajit, Jeff) [0. 8] has. Advisor(David, Jeff) [0. 7] works. At(Jeff, Stanford) [0. 9] type(Princeton, University) [1. 0] type(Stanford, University) [1. 0] type(Jeff, Computer_Scientist) [1. 0] type(Surajit, Computer_Scientist) [1. 0] type(David, Computer_Scientist) [1. 0]
Temporal Knowledge
Probabilistic-Temporal Consistency Reasoning Derived Facts t 3 team. Mates(Beckham, Ronaldo, Tt 3) State Relation 0. 08 ‘ 03 0. 4 Base Facts ‘ 04 0. 16 plays. For(Beckham, Real, T 1) Ù plays. For(Ronaldo, Real, T 2) Ù overlaps(T 1, T 2) 0. 12 ‘ 05 ‘ 07 0. 6 ‘ 05 ‘ 07 ‘ 03 plays. For(Beckham, Real, T 1) 0. 1 0. 2 0. 4 0. 2 ‘ 00 ‘ 02 ‘ 07 ‘ 04 ‘ 05 plays. For(Ronaldo, Real, T 2)
Topics for Internships & Master Theses Research Internships • Preparation & Integration of Linked Data Sources for Scientific Experiments (SQL/Java/Python) • Mining Association Rules from Linked Data (Java/C++) • Visualization Frontend for Linked Data (Action. Script & Adobe Flash) Master Theses • Implementation of a distributed rule-based query engine for RDF data (C++ & Message Passing Interface) • Implementation of a distributed factor graph model for correlated RDF facts (C++ & Message Passing Interface) • Faceted Search and Interactive Browsing for Linked Data
Floris Geerts
RDBMS-based recommendation systems EDI NY n Find top-3 flights from Edi to NYC with at most one stop ¨ Items: flights ¨ Selection criteria: relational queries ¨ Utility function: in terms of price and duration (for ranking) n Top-k item selection Selection criteria Utility function top-k items … items 32 Books, music, news, Web sites, research papers, …. .
Query relaxation Query for 5 -day holiday Q(f#, name, type, ticket, time) = ∃DT, AD, x. To ( flight ( f#, EDI, x. To, DT, 5/19/2012, AT, AD, Pr ) ∧ POI ( name, x. To, type, ticket, time) ∧x. To= NYC ) Relaxation: cities There is no direct flight within 15 miles of EDI from EDI to NYC or NYC are acceptable E = { EDI, NYC, 4/1/2012 }, X = { x. To } Q 1(f#, name, type, ticket, time) =∃DT, AD, u. To, w. Edi, w. NYC, w. DD ( flight ( f#, w. Edi, x. To, DT , w. DD, AT, A D, Pr ) ∧ x. To= w. NYC ∧ POI( name, u. To, type, ticket, time) ∧ w. DD=5/19/2012 ∧ dist(w. NYC, NYC)≤ 15 ∧ dist(w. Edi, EDI) ≤ 15 ∧ x. To=u. To) Further relaxation: departure dates within 3 days of 5/19/2012 arequery relaxation dist(w. DD, 5/10/2012 ) ≤ 3 valid 33 acceptable
Topics n Top-k query answering algorithm on top of RDBMS n Query relaxation approaches and query completion
Data quality • Detecting and correcting inconsistencies • Finding duplicates • Finding most up-to-date information
Semantic errors Yahoo! Finance Day’s Range: 93. 80 -95. 71 Nasdaq 52 wk Range: 25. 38 -95. 71 Day’s Range: 93. 80 -95. 71 52 Wk: 25. 38 -93. 72
Instance ambiguity
Out-of-Date Data 4: 05 pm 3: 57 pm
Unit errors 76. 82 B 76, 821, 000
Topics n n Fast inconsistency detection Duplication elimination algorithms Automated repairing algorithms Mining of “data quality rules”
- Helen c. erickson nursing theory
- Relational modeling vs dimensional modeling
- Advanced state modeling in ooad
- Advanced part modeling
- Btm 382
- Exercise 4
- Advanced research group
- Booz allen internships
- Umuc internships
- Baxter international internships
- Supply chain internships
- Novo nordisk
- Paid internships in germany
- Caltrans internships
- Summer internship montreal
- Umbc internships
- Loreal internships
- National instruments internships
- Are internships worth the effort
- Usaa internship
- Southern connecticut state university criminal justice
- Conclusion of educational psychology
- Sodexo internship
- Montclair state university internships
- Lax internships
- Peekskill meteorite
- Dietetic internships in georgia
- Dietetic internships in georgia
- Microsoft desktop optimization pack 2018 download
- Advanced development group
- Anova within group and between group
- Unsocial group example
- Amino group and carboxyl group
- Amino group and carboxyl group
- Group 1 and group 2 specialties
- Joining together group theory and group skills
- Dfd symbols are referenced by using all
- Modeling with quadratic functions
- Dfd chapter 5
- Intelligence advanced research projects activity
- Consortium for advanced research training in africa