Extracting Academic Affiliations Alicia Tribble Einat Minkov Andy
Extracting Academic Affiliations Alicia Tribble Einat Minkov Andy Schlaikjer Laura Kieras
The Problem • Determine academic institutions with which a professor is or has been affiliated – Where degrees earned – Previous affiliations, including post-doc – Current affiliation • Why would this be useful? – Studying social networks in academia – Person entity disambiguation
Knowledge We Will Learn • Example text rules to be learned: – If string=“ <person> received his <degree> in <department> from <institution> ”, Then: 'Affiliated(<person>, <institution>)‘ – If string=“<degree> , <department> , <institution> ” on <person> ’s home page, Then: 'Affiliated(<person>, <institution>)'” • Class of beliefs to be learned: – Affiliated(<person>, <institution>)
Sources of redundant information • URL of professor’s personal home page (e. g. , www. cmu. edu/~xxx) • Text found on multiple web pages, especially in resume, CV, or biography section of personal home pages • Links incoming and outgoing from personal home pages
Additional information • Dictionary of institution names • Dictionary of degrees – E. g. Ph. D. , B. S. , B. Tech. , etc • Map of domain names to institution names – E. g cmu. edu -> Carnegie Mellon University – This could be learned but we will leave that for another group!
Bootstrapping Logistics • Start with a few seed rules and seed facts • Use these rules to learn more facts, these facts to learn more rules, etc!
Our seed facts • Affiliated(<Tom M. Mitchell>, <Stanford University>) • Affiliated(<Tom Mitchell>, <Carnegie Mellon University>) • Affiliated(<William Cohen>, <Duke University>)
Our seed rules • If URL of personal web page is in the academic URL dictionary, then believe Affiliated(<person>, <institution>) • If looking at a resume or personal web page and any of the patterns below are found, then believe Affiliated(<person>, <institution>): – – "<degree>. <department> <institution>. "<degree>. <institution> <department>” "<position>, <department> <institution>” "<person> received <pronoun> <degree> from <institution>"
Algorithm walk-through 1) Start with known belief Affiliated(William Cohen, Duke University) 2) Extract sentences from William Cohen web page that contain "William Cohen" and "Duke" a. Found pattern "William Cohen received his bachelor's degree in Computer Science from Duke University in 1984 ” b. Learned new pattern "received <pronoun> <degree> from <institution>”
Walk-through continued 3) Search for new web pages matching our pattern "received his degree from” a. Found example: "Adnan Darwiche is an Associate Professor of Computer Science at UCLA, having received his Ph. D and MS degrees in Computer Science from Stanford University” b. Extracted belief Affiliated(Adnan Darwiche, Stanford University)
- Slides: 11