Twitter Catches The Flu Detecting Influenza Epidemics using

Why we developed this system? Let me show you several existing systems

Centers for Disease Control and Prevention (CDC)

Infection Disease Surveillance Center (IDSC)

European Influenza Surveillance Network (EISN)

Why each country has each surveillance system? • Influenza epidemics are a major public

Two Problems & Recent Approach • (1) Small Scale – For example, IDSC gathers

Recent Approach • using Phone Call data – Espino et al. (2003) used data

The State-of-the-Art Web based Approach • Ginsberg et al. (Nature 2009) used Google web

This Study • Web search query is a extremely large scale and real-time data

OUTLINE • Background • Objective • Method • Experiment • Discussion • Conclusion Detailed

0. 00010% Actual influenza curve is more smooth Simple Word Frequency contains various noises

A word “influenza” does not always indicate an influenza patient Positive Influenza Tweet Negative

Two types of Influenza Tweets Positive Influenza Tweet • Negative influenza tweet indicates an

Various Negative Influenza Tweet (1/2) • Prevention – You need to get a influenza

Various Negative Influenza Tweet (2/2) • Influenza of Cat or Dog – Today, I

Research Questions • In total, half of Influenza related tweets are negative, motivating an

OUTLINE • Background • Method • Experiment • Discussion • Conclusion

Basic Idea: Binary Classification • We regard this task as a binary classification task

Corpus (5 k Sentences with Labels) See proceeding for detailed Average Annotator Agreement Ratio

What kind of Feature? Twitter contains many ungrammatical expressions • Surrounding Words (BOW, no

What kind of Machine Learning Method? Classifier Ada. Boost Bagging Decision Tree Logistic Regression

OUTLINE • Background • Method • Experiment • Discussion • Objective

Twitter Data (2008 -2010) Season III Season IV • First month is used for

Method Comparison & Evaluation • (1) TWEET-SVM (The proposed method) • (2) TWEET-RAW –

Result: Correlation Ratio +SVM TWEET-RAW TWEET-SVM GOOGLE DRUG Season I 0. 683 0. 816

Why Twitter suffers from Season II? �Because it includes Pandemic! WHO says Pandemic In

Season I Relative number TWEET-SVM ≒ GOOGLE

Relative number Season II TWEET-SVM << GOOGLE

OUTLINE • Background • Method • Experiment • Discussion • Conclusion Extra Experiment

Frequent Question • Could an Influenza Patient REALLY use a Twitter or Google Search?

Implemented by using Infectious Model [Kermack 1927] (≒ Markov model) UNDER FLU BEFORE FLU

BUT: It ALSO improves Google based Approach • This model improves correlation of BOTH

Answer to Research Questions • This study proposed a new influenza surveillance system using

Conclusion • Still now, more than 100 (sometime over 1, 000) people die from

Thank you NLP could save a life! Eiji ARAMAKI Ph. D. University of Tokyo

Slides: 38

Download presentation

Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter Eiji ARAMAKI * Sachiko MASKAWA * Mizuki MORITA ** * The University of Tokyo ** National Institute of Biomedical Innovation EMNLP 2011

Why we developed this system? Let me show you several existing systems

Centers for Disease Control and Prevention (CDC)

Infection Disease Surveillance Center (IDSC)

European Influenza Surveillance Network (EISN)

Why each country has each surveillance system? • Influenza epidemics are a major public health concern, because it causes tens of millions of illnesses each year. • To reduce the victims, the early detection of influenza epidemics is a national mission in every country. • BUT: These surveillance systems basically rely on hospital reports (written manually).

Two Problems & Recent Approach • (1) Small Scale – For example, IDSC gathers influenza patient data from 5, 000 clinics. But It does not cover all cities (especially local cities). • (2) Time Delay (Time lag) – For example, the data gathering process typically has a 1– 2 week reporting lag • To deal with these problems – Recently, various approaches that directly capture people’s behavior are proposed

Recent Approach • using Phone Call data – Espino et al. (2003) used data of a telephone triage service, a public service, to give an advice to users via telephone. They reported the number of telephone calls that correlates with influenza epidemics. • using Drug sale data – Magruder (2003) used the amount drug sales. Among various approaches…

The State-of-the-Art Web based Approach • Ginsberg et al. (Nature 2009) used Google web search queries that correlate with an influenza epidemic, such as “flu”, “fever”. • Polgreen et al. (2008) used a Yahoo! query log. • Hulth et al. (2009) used a query log of a Switzerland web search engine.

This Study • Web search query is a extremely large scale and real-time data resource. • BUT: the query data is closed (not freely available), which is available only for several companies, such as Google, Yahoo, or Microsoft. → This study examines Twitter data, which is widely available.

OUTLINE • Background • Objective • Method • Experiment • Discussion • Conclusion Detailed Task Definition

0. 00010% Actual influenza curve is more smooth Simple Word Frequency contains various noises Because…. 20081116 20081124 20081202 20081210 20081218 20081226 20090103 20090111 20090119 20090127 20090204 20090212 20090220 20090228 20090308 20090316 20090324 20090401 20090409 20090417 20090425 20090503 20090511 20090519 20090527 20090604 20090612 20090620 20090628 20090706 20090714 20090722 20090730 20090807 20090815 20090823 20090831 20090908 20090916 Simple Word Frequency in Twitter “Cold”, “Fever” & “influenza” 0. 01000% 0. 00100% influenza Winter cold fevor Summer

A word “influenza” does not always indicate an influenza patient Positive Influenza Tweet Negative Influenza Tweet

Two types of Influenza Tweets Positive Influenza Tweet • Negative influenza tweet indicates an influenza patient Negative Influenza Tweet • Negative influenza tweet includes mention of “influenza”, but does not indicate that an influenza patient is present • Not only the general news, but also various phenomena generate Negative influenza tweet…

Various Negative Influenza Tweet (1/2) • Prevention – You need to get a influenza shot sometime soon. • Modality (just suspition) – @John might be suffering from influenza • Question – Did you catch the influenza ?

Various Negative Influenza Tweet (2/2) • Influenza of Cat or Dog – Today, I couldn't go home late. My cat caught the influenza. . . • Influenza of TV Character – In the last episode of that TV Series, Ritsu-chan caught the flu

Research Questions • In total, half of Influenza related tweets are negative, motivating an automatic filtering. • RQ 1: Could a NLP system filter out the negative influenza tweet? • RQ 2: Could this filtering contributes to the surveillance accuracy?

OUTLINE • Background • Method • Experiment • Discussion • Conclusion

Basic Idea: Binary Classification • We regard this task as a binary classification task , such as a spam mail filtering input (2) What kind of Feature? Training Corpus (3) What kind of Machine Learning Method? (1) What kind of Corpus? Negative Positive

Corpus (5 k Sentences with Labels) See proceeding for detailed Average Annotator Agreement Ratio = 0. 85

What kind of Feature? Twitter contains many ungrammatical expressions • Surrounding Words (BOW, no stemming, no POS) I think the influenza is going around L 3 L 2 L 1 R 2 R 3 • Among various settings, Window size = 6 achieved the highest accuracy

What kind of Machine Learning Method? Classifier Ada. Boost Bagging Decision Tree Logistic Regression Nearest Neighbor Random Forest SVM (polynomial; d=2) F-Measure 0. 592 0. 739 0. 698 0. 729 0. 695 0. 729 0. 738 Time 40. 192 530. 310 239. 446 696. 704 22. 441 38. 683 92. 723 • Among various settings, SVM achieved the feasible accuracy

OUTLINE • Background • Method • Experiment • Discussion • Objective

Twitter Data (2008 -2010) Season III Season IV • First month is used for training corpus • We divides the other data into 4 seasons – Twitter API sometimes changes the spec, leading to dropout periods.

Method Comparison & Evaluation • (1) TWEET-SVM (The proposed method) • (2) TWEET-RAW – Based on simple word frequency of “influenza” • (3) GOOGLE [Ginsberg 2009] – Based on Google web-search query – The previous estimation data is available at the Google Flu Trend website. • (4) DRUG-SALE [Magruder 2003] • Evaluation is based on – Average Correlation with GOLD_STANDARD DATA that is the real number of the influenza patients reported by Infection Disease Surveillance Center (IDSC)

Result: Correlation Ratio +SVM TWEET-RAW TWEET-SVM GOOGLE DRUG Season I 0. 683 0. 816 0. 817 -0. 208 Season II -0. 009 -0. 018 0. 232 0. 406 Season III 0. 382 0. 474 0. 881 0. 684 Season IV 0. 390 0. 957 0. 976 0. 130 Bold indicates the correlation > statistical significance level. In most seasons, the proposed method achieved the higher correlation than simple word freq-based method, demonstrating the advantage of the SVM based filtering

Result: Correlation Ratio +SVM TWEET-RAW TWEET-SVM GOOGLE DRUG Season I 0. 683 0. 816 0. 817 -0. 208 Season II -0. 009 -0. 018 0. 232 0. 406 Season III 0. 382 0. 474 0. 881 0. 684 Season IV 0. 390 0. 957 0. 976 0. 130 Bold indicates the correlation > statistical significance level. Except for Season II, the proposed method achieved almost the same accuracy to GOOGLE.

Why Twitter suffers from Season II? �Because it includes Pandemic! WHO says Pandemic In 1999 Jul (Season II). Suggesting Twitter might be biased by News Media TWEET-RAW Normal Season 0. 831 Pandemic Season 0. 001 TWEET-SVM GOOGLE DRUG 0. 890 0. 060 0. 847 0. 918 0. 308 0. 844

Season I Relative number TWEET-SVM ≒ GOOGLE

Relative number Season II TWEET-SVM << GOOGLE

OUTLINE • Background • Method • Experiment • Discussion • Conclusion Extra Experiment

Frequent Question • Could an Influenza Patient REALLY use a Twitter or Google Search? • That seems to be un-natural situation! I’d like to sleep. . . Due to that, we modified the system assuming as follows: People use Twitter or Google at the first sign of the influenza

Implemented by using Infectious Model [Kermack 1927] (≒ Markov model) UNDER FLU BEFORE FLU S Catch the flu Susceptible 0. 62 I Infectious Recover 0. 38 AFTER FLU R Recover • S-to-I transition is observed by Twitter / Google • 38% of Influenza people recover a day

BUT: It ALSO improves Google based Approach • This model improves correlation of BOTH Twitter & GOOGLE. • This result suggests that there is a room of collaboration between medical study and web/NLP study

OUTLINE • Background • Method • Experiment • Discussion • Conclusion

Answer to Research Questions • This study proposed a new influenza surveillance system using Twitter • RQ 1: Could a system filter out the negative influenza? – Yes. But NOT Perfect • RQ 2: Could this accuracy contribute to the surveillance performance? – YES. It increases the correlation (except for pandemic period). • We could achieve the almost same accuracy to GOOGLE using freely available data.

Conclusion • Still now, more than 100 (sometime over 1, 000) people die from influenza in Japan • We hope that this study might help people

Thank you NLP could save a life! Eiji ARAMAKI Ph. D. University of Tokyo http: //mednlp. jp