Named Entity Tagging with Conditional Random Fields Ryan

Goals Improve on the results of the current NE tagger used by UPenn ACE

ACE Definition Find entities and classify them as Person, GPE, Organization, Location and/or Facility

Max. Ent vs. CRFs Ran an MEMM tagger and a CRF tagger with: n

Features Word: Unigram* 1 -suffix, 2 -suffix, 3 -suffix and 4 -suffix: Unigram and

MEMM vs. CRF • Same feature set • Same training algorithm

ACE vs. CRF • Different feature sets (CRF is richer)

Summary These results and (Sha 2002) show that CRFs perform slightly better than MEMMs

Future and Current Work “Person” and “Organization” recall Multilayer taggers Name lists Document class

Multilayer Taggers If entity information known, can lead to a 10 -20% increase in

Name Lists Aim is to increase Recall results for person and organization categories n

Name Lists Small name lists can lead to a substantial improvement in F-Score n

Document Class Features “Atlanta defeated Florida in extra innings. . . ” n n

Slides: 13

Download presentation

Named Entity Tagging with Conditional Random Fields Ryan Mc. Donald, Fernando Pereira and Fei Sha Computer and Information Science University of Pennsylvania

Goals Improve on the results of the current NE tagger used by UPenn ACE n Accomplish through Conditional Random Field Model (Lafferty et al. 2001) Compare Max. Ent and CRFs in a controlled environment

ACE Definition Find entities and classify them as Person, GPE, Organization, Location and/or Facility “Bush took over the White House from the Clinton Administration” n n Bush: Person White House: Facility, GPE The Clinton Administration: Organization Clinton: Person

Max. Ent vs. CRFs Ran an MEMM tagger and a CRF tagger with: n n n The exact same features Exact same training algorithm (limited memory quasi-Newton) Exact same training data and test data w Have not used Sept. test data yet since more improvements on the way

Features Word: Unigram* 1 -suffix, 2 -suffix, 3 -suffix and 4 -suffix: Unigram and Bigram Word length bins: Unigram and bigram Word features defined by Tom's script: Caps, Numeric, etc. * * used in original ACE system

MEMM vs. CRF • Same feature set • Same training algorithm

ACE vs. CRF • Different feature sets (CRF is richer)

Summary These results and (Sha 2002) show that CRFs perform slightly better than MEMMs Richer feature set leads to larger improvement Portable CRF, MEMM code n Congugate Gradient, Limited Memory Quasi. Newton, Perceptron

Future and Current Work “Person” and “Organization” recall Multilayer taggers Name lists Document class information

Multilayer Taggers If entity information known, can lead to a 10 -20% increase in F-Score First layer of tagger attempts to find generic entities n Can achieve around F-Score of 0. 87 Second layer uses entity information as feature for each category classifier n Leads to about a 2 -5% increase in F-Score

Name Lists Aim is to increase Recall results for person and organization categories n n Name list size: 80, 000 Organization list size: 30, 000 Binary feature: is token in name list? n Increase Person F-Score to 0. 793 (From 0. 755) Binary feature: is token in organization list? n Increase Person F-Score to 0. 601 (From 0. 569)

Name Lists Small name lists can lead to a substantial improvement in F-Score n Even features were simplistic Investigating better name lists n MT name list of 500, 000 names and 50, 000 orgs Investigating more sophisticated features n frequency

Document Class Features “Atlanta defeated Florida in extra innings. . . ” n n n Atlanta and Florida should be tagged as organizations Mistakenly tagged as GPE If document classified as SPORTS, NE classifier may recognize things normally tagged GPE should be orgs Currently beginning to look at state of the art document classification algorithms n Could provide a richer source of knowledge