Extracting Predicates from Semistructured and Unstructured Texts Clint

  • Slides: 14
Download presentation
Extracting Predicates from Semistructured and Unstructured Texts Clint Tustison BYU DEG Funded in part

Extracting Predicates from Semistructured and Unstructured Texts Clint Tustison BYU DEG Funded in part by the NSF 1

Introduction n Vast amount of electronic data n Semi-structured n n n GEDCOM files

Introduction n Vast amount of electronic data n Semi-structured n n n GEDCOM files (format for encoding genealogical information) Clinical Trials Unstructured n n Newspaper headlines Thematic discourse (Wall Street Journal articles) 2

Questions n What current methods are employed for extracting electronic data? n What is

Questions n What current methods are employed for extracting electronic data? n What is a workable solution for the representation of the extracted information? 3

Why worry about representation? n Ambiguities abound n n BYU panel discusses war with

Why worry about representation? n Ambiguities abound n n BYU panel discusses war with Iraq Sisters reunited after 18 years in checkout counter Everybody loves somebody Differentiate meanings of an utterance A Mary B Fred C Mark … A B C D E Fred 4

Approach n Tools n Link Grammar Parser n n n Provides a syntactic dependency

Approach n Tools n Link Grammar Parser n n n Provides a syntactic dependency parse Semantics is interpretive (gets read from the syntax) Predicate logic n n Formal properties, allow for wide range of applications, usable crosslinguistically Vocabulary, syntax, semantics n n First-order: quantification over individuals (FOPC) Higher-order: quantification over relations, etc. 5

Link Grammar Parser n Sleator, Lafferty, Temperley n Benefits n n written in C

Link Grammar Parser n Sleator, Lafferty, Temperley n Benefits n n written in C very fast Robust - ability to process (un)grammaticality / spelling errors Free - http: //www. link. cs. cmu. edu/link Easily integrated 6

Link Grammar Parser linkparser> the dog ate the food. +--------Xp--------+ +-----Wd----+ +----Os----+ | |

Link Grammar Parser linkparser> the dog ate the food. +--------Xp--------+ +-----Wd----+ +----Os----+ | | +-Ds-+--Ss-+ +-D*u-+ | | | | LEFT-WALL the dog. n ate. v the food. n. ate (dog, food). 7

Clinical Trials Extraction n n Novel Adjuvants for Peptide-Based Melanoma Vaccines INCLUSION CRITERIA: Ages

Clinical Trials Extraction n n Novel Adjuvants for Peptide-Based Melanoma Vaccines INCLUSION CRITERIA: Ages Eligible for Study: 18 Years and above , Genders Eligible for Study: Both Diagnosis of stage III or IV cutaneous, mucosal, or ocular melanoma Granulocyte count at least 1, 500/mm 3 Platelet count at least 100, 000/mm 3 EXCLUSION CRITERIA: Steroid therapy or other immunosuppressive medication requirement Allergic reaction to Montanide ISA 51 (incomplete Freund's adjuvant) Positive for hepatitis B surface antigen, hepatitis C antibody, or HIV antibody 8

Predicates: Inclusion Criteria Ages Eligible for Study: 18 Years and above , Genders Eligible

Predicates: Inclusion Criteria Ages Eligible for Study: 18 Years and above , Genders Eligible for Study: Both n age(Person, X) & X >= 18. gender(Person, X) & (female == X || male == X). Diagnosis of stage III or IV cutaneous, mucosal, or ocular melanoma n diagnosis(Person, X) & melanoma(X) & type(X, Y) & (cutaneous(Y) || mucosal(Y) || ocular(Y)) & stage(X, Z) & (Z == 3 || Z == 4). 9

Predicates: Exclusion Criteria Allergic reaction to Montanide ISA 51 (incomplete Freund's adjuvant) n Steroid

Predicates: Exclusion Criteria Allergic reaction to Montanide ISA 51 (incomplete Freund's adjuvant) n Steroid therapy or & other immunosuppressive medication ¬(allergy(Person, X) montanide(X)). requirement n Positive for hepatitis surface antigen, hepatitis C antibody, or ¬(therapy(Person, X) & Bsteroid(X)). HIV antibody n ¬(condition(Person, X) & hepatitis_B(X) || hepatitis_C(X) || hiv(X)). 10

News Headlines Extraction n n n Bangladesh frees UK journalists n frees(bangladesh, uk_journalists). Lieberman

News Headlines Extraction n n n Bangladesh frees UK journalists n frees(bangladesh, uk_journalists). Lieberman mulls 2004 bid n mulls(lieberman, 2004_bid). Avalanche kills snowboarder in Nevada n kills(avalanche, snowboarder, nevada). Pope tackles US sex abuse n tackles(pope, us_sex_abuse). Hubble watches galactic dance n watches(hubble, dance) & galactic(dance). Mbeki bemoans racial divisions n bemoans(mbeki, divisions) & racial(divisions). 11

GEDCOM Extraction n individual(i 1, name('Dovie MELLISSIA /STEVENSON/'), sex(f), parentin(f 1), childin(f 2), birthdate('18

GEDCOM Extraction n individual(i 1, name('Dovie MELLISSIA /STEVENSON/'), sex(f), parentin(f 1), childin(f 2), birthdate('18 Sep 1908'), baptismdate('10 Apr 1919'), endowdate('9 Mar 1976'), deathdate(''), birthplace('OKTAHA, MUSKOGEE, OK, USA'), deathplace(''), burialplace('')). n individual(i 2, name('WILLIAM JAMES /STEVENSON/'), sex(m), parentin(f 4), childin(f 5), birthdate('5 Sep 1880'), baptismdate('13 Sep 1903'), endowdate('9 May 1969'), deathdate('22 Nov 1964'), birthplace('PENDLETON, WARREN, PA'), deathplace('TULARE, CA'), burialplace('VISALIA, TULARE, CA')). n individual(i 3, name('/MAHLER/'), sex(m), parentin(f 6), childin(f 5), birthdate('5 Sep 1880'), baptismdate('13 Sep 1903'), endowdate('9 May 1969'), deathdate('22 Nov 1964'), birthplace('PENDLETON, WARREN, PA'), deathplace('TULARE, CA'), burialplace('VISALIA, TULARE, CA')). 12

Inferencing /************************************** Which husband/wife combination was born on the same day in the same

Inferencing /************************************** Which husband/wife combination was born on the same day in the same place? **************************************/ husband_wife(Husband. Name, HBirthdate, Wife. Name, WBirthdate, X) : individual(Husband, name(Husband. Name), _, _, _, birthdate(HBirthdate), _, _, _, birthplace(X), _, _), family(_, husband(Husband), _, _), parse_date(HBirthdate, HDay, HMonth, HYear), individual(Wife, name(Wife. Name), _, _, _, birthdate(WBirthdate), _, _, _, birthplace(X), _, _), family(_, _, wife(Wife), _), parse_date(WBirthdate, WDay, WMonth, WYear), HYear == WYear, HMonth == WMonth, HDay == WDay. Husband. Name = Garland /Bailey/ HBirthdate = 16 Apr 1912 Wife. Name = Carolyn /Warren/ WBirthdate = 16 Apr 1912 Place = Gracemont, Caddo, Oklahoma Husband. Name = Charles Arthur /Goodpasture/ HBirthdate = 25 Dec 1894 Wife. Name = Betty Lucille /Rittga/ WBirthdate = 25 Dec 1894 Place = Gracemont, Caddo, Oklahoma 13

Contribution/Future Work n Contributions n Robustly extract predicates from natural language n n n

Contribution/Future Work n Contributions n Robustly extract predicates from natural language n n n Use applications to access predicates n n Multiple domains Various natural language syntactic constructions Inferencing and querying Future Work n n Extract predicates from other domains Integrate with external knowledge sources n n n Wordnet UMLS Upgrade to higher-order predicate calculus to allow predication over relations and events, not just individuals 14