Detecting Genre Shift Mark Dredze Tim Oates Christine

  • Slides: 47
Download presentation
+ Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at

+ Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

+ Natural Language Processing and Machine Learning Extracting findings from scientific papers • Genetic

+ Natural Language Processing and Machine Learning Extracting findings from scientific papers • Genetic epidemiology (development domain) • Pub. Med search produces thousands of papers • Manually reviewed to extract findings • Findings determine relevant papers/studies • Automate this process with ML/NLP methods • Create searchable database of findings • Allow machine inference over findings • Suggest new scientific hypotheses

+ Genre Shift in Statistical NLP … told that John Paul Stevens is retiring

+ Genre Shift in Statistical NLP … told that John Paul Stevens is retiring this summer … Named Entity Recognition … President Barack Obama is urging members to …

+ Supervised Machine Learning for Named Entity Recognition Today the Atlantic Ocean is in

+ Supervised Machine Learning for Named Entity Recognition Today the Atlantic Ocean is in an uproar and North Carolina remains in a state of anxiety. Windowed Text Label Today the Atlantic Ocean is B the Atlantic Ocean is in I Atlantic Ocean is in an O Ocean is in an uproar O is in an uproar and O in an uproar and North O an uproar and North Carolina O uproar and North Carolina remains B and North Carolina remains in I North Carolina remains in a O

+ Supervised Machine Learning for Named Entity Recognition Windowed Text Label Today the Atlantic

+ Supervised Machine Learning for Named Entity Recognition Windowed Text Label Today the Atlantic Ocean is B the Atlantic Ocean is in I Atlantic Ocean is in an O Feature Vector Label [today, the, atlantic, ocean, is, U, L, U, U, L] B [the, atlantic, ocean, is, in, L, U, U, L, L] I [atlantic, ocean, is, in, an, U, U, L, L, L] O

+ Genre Shift in Statistical NLP … told that John Paul Stevens is retiring

+ Genre Shift in Statistical NLP … told that John Paul Stevens is retiring this summer … Named Entity Recognition … PRESIDENT BARACK OBAMA IS URGING MEMBERS TO… ? ? ?

+ This is a Pervasive Problem n Extracting regulatory pathways from online bioinformatics journals

+ This is a Pervasive Problem n Extracting regulatory pathways from online bioinformatics journals using a parser trained on the WSJ n Finding faces in images of disaster victims using a model trained on “mug shot” images n Identifying RNA sequences that regulate gene expression in a lab in Baltimore using a model trained on data gathered in a lab in Germany When things change in a way that’s harmful, we’d like to know!

+ Data Streams Change Over Time Sentiment classification from movie reviews § Natural drift

+ Data Streams Change Over Time Sentiment classification from movie reviews § Natural drift § Users unaware of system limitations

+ Detecting Genre Shift Genre shift hurts system performance (accuracy) Two problems 1) Detect

+ Detecting Genre Shift Genre shift hurts system performance (accuracy) Two problems 1) Detect changes in stream of numbers (A-distance) 2) Convert document stream to stream of informative numbers (margin)

+ Detecting Genre Shift Genre shift hurts system performance (accuracy) § Measure accuracy directly

+ Detecting Genre Shift Genre shift hurts system performance (accuracy) § Measure accuracy directly Ø Requires labeled examples! § Look for changes in feature distributions Ø Words become more/less common Ø New words appear

+ Measuring Changes in Streams: The A-Distance P P’ A nonparametric, distribution independent measure

+ Measuring Changes in Streams: The A-Distance P P’ A nonparametric, distribution independent measure of changes in univariate, real-valued data streams (Kifer, Ben-David, and Gherke, 2004)

+ Measuring Changes in Streams: The A-Distance P P’ >ε

+ Measuring Changes in Streams: The A-Distance P P’ >ε

+ Measuring Changes in Streams: The A-Distance P P’ >ε

+ Measuring Changes in Streams: The A-Distance P P’ >ε

+ Changes in Document Streams X … President Barack Obama is urging members to

+ Changes in Document Streams X … President Barack Obama is urging members to …

+ Changes in Document Streams X Obama embassy … President Barack Obama is urging

+ Changes in Document Streams X Obama embassy … President Barack Obama is urging members to … 4 1

+ Changes in Document Streams Obama embassy … President Barack Obama is urging members

+ Changes in Document Streams Obama embassy … President Barack Obama is urging members to … W X 1. 6 4 1 0. 1 1. 6 * 4 + 0. 1 * 1 + … = 3. 7

+ Changes in Document Streams Obama embassy W X 1. 6 4 1 0.

+ Changes in Document Streams Obama embassy W X 1. 6 4 1 0. 1 1. 6 * 4 + 0. 1 * 1 + … = 3. 7 … President Barack Obama is urging members to … • WX = margin • sign of WX is class label (+/-) • magnitude of WX is “certainty” in label

+ Why Margins? n We have an easy way of producing them from unlabeled

+ Why Margins? n We have an easy way of producing them from unlabeled examples! n We want to track feature changes Margins are linear combinations of feature values n Removing important features yields smaller margins n Only track features that matter, features with zero (small) weight don’t affect margin (much) n n Spoiler alert! Tracking margins works really well for unsupervised detection on genre shifts.

+ Accuracy vs. Margins DVD to Electronics

+ Accuracy vs. Margins DVD to Electronics

+ Accuracy vs. Margins DVD to Electronics Average in block Average over last 100

+ Accuracy vs. Margins DVD to Electronics Average in block Average over last 100 instances

+ Accuracy vs. Margins DVD to Electronics

+ Accuracy vs. Margins DVD to Electronics

+ Confidence Weighted Margins n Margins can be viewed as measure of confidence n

+ Confidence Weighted Margins n Margins can be viewed as measure of confidence n We detect when confidence in classifications drops n Confidence Weighted (CW) learning refines this idea n n Gaussian distribution over weight vectors n Mean of weight vector: μ in RN n Diagonal co-variance matrix: σ in n Low variance high confidence RNx. N Normalized margin: μx / (x. Tσx)0. 5 n Called VARIANCE in slides that follow μ 1. 6 0. 1 σ = 0. 02 σ = 1. 74

+ Experiments n Datasets n Sentiment classification between domains (Blitzer et al. , 2007)

+ Experiments n Datasets n Sentiment classification between domains (Blitzer et al. , 2007) n n Spam classification between users (Jiang and Zhai, 2007) n Named entity classification between genres (ACE 2005) n n DVDs, electronics, books, kitchen appliances News articles, broadcast news, telephone, blogs, etc. Algorithms n Baselines: SVM, MIRA, CW n Our method: VARIANCE

+ Experiments n n Simulated domain shifts between each pair of genres n 38

+ Experiments n n Simulated domain shifts between each pair of genres n 38 pairs, 10 trials each with different random instance orderings n 500 source examples n 1500 target examples False change n n 11 datasets with no shift, 10 trials with different random instance orderings If no shift found then detection recorded as end of target examples when computing averages

+ Comparing Algorithms o od o G fo r r ou r pp h

+ Comparing Algorithms o od o G fo r r ou r pp h ac ! a e s n eli Go Instances from point of shift od fo a rb

+ SVM vs. VARIANCE

+ SVM vs. VARIANCE

+ SVM vs. VARIANCE

+ SVM vs. VARIANCE

+ Summary of Results Thus Far n VARIANCE detected shifts faster than … n

+ Summary of Results Thus Far n VARIANCE detected shifts faster than … n SVM 34 times out of 38 n MIRA 26 times out of 38 n CW 27 times out of 38

+ Gradual Shifts

+ Gradual Shifts

+ What if you have labels? n STEPD: a Statistical Test of Equal Proportions

+ What if you have labels? n STEPD: a Statistical Test of Equal Proportions to Detect concept drift (Nishida and Yamauchi, 2007) n Monitors accuracy of classifier from stream of labeled examples n Parameters: window size, W, and threshold, α

+ Comparison to STEPD

+ Comparison to STEPD

+ What about false positives?

+ What about false positives?

+ The A-Distance: Choosing Parameters P A n >ε

+ The A-Distance: Choosing Parameters P A n >ε

+ The A-Distance: Choosing Parameters P A n >ε

+ The A-Distance: Choosing Parameters P A n >ε

+ The A-Distance: Choosing Parameters • A-distance paper gives bounds on FPs and FNs

+ The A-Distance: Choosing Parameters • A-distance paper gives bounds on FPs and FNs • Bounds depend on n and e • Bounds do not depend on tiling! • So loose as to be meaningless • No guidance on how to choose tiling • What if tiles lie outside support of data?

+ Better Bounds n PA = true probability of a point falling in tile

+ Better Bounds n PA = true probability of a point falling in tile A n h = number of points that actually fell in A n p. A = h/n = ML estimate of PA n Define P’A, h’, and p’A for second window n Suppose PA = P’A, then any change detected is a false positive What is the probability that |p. A – p’A| > e/2? >ε

+ Posterior Over PA n B(a, b) is the Beta function over a +

+ Posterior Over PA n B(a, b) is the Beta function over a + b Bernoulli trials n a trials have one outcome (point lands in tile A) n b trials have the other (point lands in some other tile)

+ False Positives: Two Cases

+ False Positives: Two Cases

+ Don’t worry, I’m not going to explain this (much)

+ Don’t worry, I’m not going to explain this (much)

+ Probability of a FP (n = 200)

+ Probability of a FP (n = 200)

+ Probability of FN

+ Probability of FN

+ Minimizing Expected Loss

+ Minimizing Expected Loss

+ Moving Forward Twitter Transcribed Broadcast News Genre Classifier Newswire

+ Moving Forward Twitter Transcribed Broadcast News Genre Classifier Newswire

+ Genre Shift “Fix” … told that John Paul Stevens is retiring this summer

+ Genre Shift “Fix” … told that John Paul Stevens is retiring this summer … Named Entity Recognition … PRESIDENT BARACK OBAMA IS URGING MEMBERS TO…

+ Genre Shift “Fix” … told that John Paul Stevens is retiring this summer

+ Genre Shift “Fix” … told that John Paul Stevens is retiring this summer … Named Entity Recognition … PRESIDENT BARACK OBAMA IS URGING MEMBERS TO… … President Barack Obama is urging members to …

+ Conclusion n Changes in margins convey useful information about changes in classification accuracy

+ Conclusion n Changes in margins convey useful information about changes in classification accuracy n No need for labeled examples! n The A-distance applied to margin streams finds genre shifts with few false positives/negatives n Confidence weighted margins normalized by variance detect shifts faster than SVM, MIRA, or (non-normalized) CW margins n Our approach even works with gradual shifts and compares favorably to shift detectors that use labeled examples

+ Thank you!

+ Thank you!