Twitter Monitor Trend Detection over the Twitter Stream

  • Slides: 38
Download presentation
Twitter. Monitor: Trend Detection over the Twitter Stream Even. Tweet: Online Localized Event Detection

Twitter. Monitor: Trend Detection over the Twitter Stream Even. Tweet: Online Localized Event Detection from Twitter Presenter: Liu, Ya Tian, Yujia Pham, Anh

Twitter. Moniter: Trend Detection over the Twitter Stream Michael Mathioudakis, Nick Koudas

Twitter. Moniter: Trend Detection over the Twitter Stream Michael Mathioudakis, Nick Koudas

INTRODUCTION �Twitter. Monitor, a system that performs trend detection over the Twitter stream. �Identifies

INTRODUCTION �Twitter. Monitor, a system that performs trend detection over the Twitter stream. �Identifies emerging topics on Twitter in real time and provides analytics that synthesize and accurate description of each topic. athioudakis, Nick Koudas, Twitter. Monitor: trend detection over the twitter stream. , In: SIGMOD Conference, pp. 155 -1158, 2010

TREND DETECTION AND ANALYSIS �Trend detection in two steps. �Analyzes trends in a third

TREND DETECTION AND ANALYSIS �Trend detection in two steps. �Analyzes trends in a third step: �Identifies ‘bursty’ keywords, �Groups bursty keywords into trends, �Extracts additional information to discover interesting aspects of it. athioudakis, Nick Koudas, Twitter. Monitor: trend detection over the twitter stream. , In: SIGMOD Conference, pp. 155 -1158, 2010

Detecting Bursty Keywords �Keyword: An unusually high rate in the stream. �New topic emerged

Detecting Bursty Keywords �Keyword: An unusually high rate in the stream. �New topic emerged and seeks to explore in the further. �Algorithm: Queue. Burst � 1) One-pass. � 2) Real-time. � 3) Adjustable against ‘spurious’ bursts. � 4) Adjustable against spam. � 5) theoretically sound. athioudakis, Nick Koudas, Twitter. Monitor: trend detection over the twitter stream. , In: SIGMOD Conference, pp. 155 -1158, 2010

From Bursty Keywords to Trends � athioudakis, Nick Koudas, Twitter. Monitor: trend detection over

From Bursty Keywords to Trends � athioudakis, Nick Koudas, Twitter. Monitor: trend detection over the twitter stream. , In: SIGMOD Conference, pp. 155 -1158, 2010

Trend Analysis �Compose a more accurate description: �Identify more keywords associated with it. Context

Trend Analysis �Compose a more accurate description: �Identify more keywords associated with it. Context extraction algorithms (PCA, SVD, etc. ) search the recent history and reports the most correlated keywords. Grapevine’s entity extractor to identify the entities. �Frequently cited sources are added to the trend athioudakis, Nick Koudas, Twitter. Monitor: trend detection over the twitter stream. , In: SIGMOD Conference, pp. 155 -1158, 2010 description. �Identifies frequent geographical origins.

Architecture Index Mathioudakis, Nick Koudas, Twitter. Monitor: trend detection over the twitter stream. ,

Architecture Index Mathioudakis, Nick Koudas, Twitter. Monitor: trend detection over the twitter stream. , In: SIGMOD Conference, pp. 1156, 2010

Architecture: Back-End �The Stream. Listener module receives sample which consists 10 M out of

Architecture: Back-End �The Stream. Listener module receives sample which consists 10 M out of 50 M tweets per day, via the Twitter API. �Then seperates tweet information into fields and exports two feeds: � Reporting tweets with all their fields to an Index module � Reporting only the text and timestamp of tweets to Bursty Keywords Detection module athioudakis, Nick Koudas, Twitter. Monitor: trend detection over the twitter stream. , In: SIGMOD Conference, pp. 155 -1158, 2010

Architecture: Back-End(Cont. ) �After bursty keywords are identified and grouped into trends, the Index

Architecture: Back-End(Cont. ) �After bursty keywords are identified and grouped into trends, the Index is contacted by the Trend Analysis module to retrieve information on tweets that belong to each trend. athioudakis, Nick Koudas, Twitter. Monitor: trend detection over the twitter stream. , In: SIGMOD Conference, pp. 155 -1158, 2010

Architecture: Front-End athioudakis, Nick Koudas, Twitter. Monitor: trend detection over the twitter stream. ,

Architecture: Front-End athioudakis, Nick Koudas, Twitter. Monitor: trend detection over the twitter stream. , In: SIGMOD Conference, pp. 1157, 2010

Architecture: Front-End (Cont. ) �A webpage reports recent trends in real time �An interface

Architecture: Front-End (Cont. ) �A webpage reports recent trends in real time �An interface allows users to rank trends by recency or current activity rate and submit their own short description for trends. �Use an additional tab to display daily trends. athioudakis, Nick Koudas, Twitter. Monitor: trend detection over the twitter stream. , In: SIGMOD Conference, pp. 155 -1158, 2010

Demonstration � Every trend will be represented by the entities, by the related bursty

Demonstration � Every trend will be represented by the entities, by the related bursty keywords. � The audience will have the option to use the interface in order to acquire more information. ❶They will be shown additional keywords and skim through representative tweets ❷They will be able to track a trend’s popularity over time and spot the origin. ❸They will interact with the system by tracking the displayed trends according different criteria and submitting descriptions. athioudakis, Nick Koudas, Twitter. Monitor: trend detection over the twitter stream. , In: SIGMOD Conference, pp. 155 -1158, 2010

Even. Tweet: Online Localized Event Detection from Twitter Hamed Abdelhaq, Christian Sengstock, and Michael

Even. Tweet: Online Localized Event Detection from Twitter Hamed Abdelhaq, Christian Sengstock, and Michael Gertz

1. Introduction 2. Localized Event Detection �Temporal Keyword Extraction �Spatial Keyword Identification �Keyword Clustering

1. Introduction 2. Localized Event Detection �Temporal Keyword Extraction �Spatial Keyword Identification �Keyword Clustering �Cluster Scoring 3. System overview 4. Demonstration

INTRODUCTION �Even. Tweet, a system to detect localized events from a stream of tweets

INTRODUCTION �Even. Tweet, a system to detect localized events from a stream of tweets in real-time. �Only about 1% of tweets are georeferenced. �Focuses on detecting localized events from a stream of tweets in real-time. �Adopts a continuous analysis of the most recent tweets within a time-based sliding window. �Described by 1) related keywords & 2) estimation of the start time and the geographic location. q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013)

INTRODUCTION �Tracks evolution over time: a fine-grained temporal resolution. A scoring scheme the gives

INTRODUCTION �Tracks evolution over time: a fine-grained temporal resolution. A scoring scheme the gives a score of each event over time. �Don’t estimate geo-coordinates for nongeotagged tweets, but be able to identify localized events using a possibly small amont of geo-tagged tweets: Both geo- and non-geo-tagged tweets are used to identify words best describing events. Only geo-tagged tweets are used to estimate q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013) the spatial

1. Introduction 2. Localized Event Detection �Temporal Keyword Extraction �Spatial Keyword Identification �Keyword Clustering

1. Introduction 2. Localized Event Detection �Temporal Keyword Extraction �Spatial Keyword Identification �Keyword Clustering �Cluster Scoring 3. System overview 4. Demonstration

Localized Event Detection Basic Definitions �Event: a phenomenon that stimulates people to post messages

Localized Event Detection Basic Definitions �Event: a phenomenon that stimulates people to post messages for a certain period of time. �Localized events: Events happen within a small region, having a small spatial extent. (e. g. , concerts, soccer matches, road works) q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013)

Localized Event �A localized event is described as a tuple: le = (el, et,

Localized Event �A localized event is described as a tuple: le = (el, et, K) �el is event location, represented as a small set of connected rectangular. �et is the start time. �K is a set of words frequently published during the event time and at that location. q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013)

Online Detection Basic Notation: �Each tweet tw = (W, uid, l, t) �W: a

Online Detection Basic Notation: �Each tweet tw = (W, uid, l, t) �W: a set of words �uid: a user id �l = (lon, lat): a geographic location �t: timestamp �Use a timeline divided into a sequence of equal- length time frames (…fc-1, fc), where fc denotes the current time frame. �Each time frame represents a short time interval during which tweets are posted. q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013)

Basic Notation (cont. ) �We use a time-based sliding window winkfc composed of k

Basic Notation (cont. ) �We use a time-based sliding window winkfc composed of k time frames and fc as its end point. �The detection procedure of Even. Tweet is triggered every time a new time frame elapses. q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013)

1. Introduction 2. Localized Event Detection �Temporal Keyword Extraction �Spatial Keyword Identification �Keyword Clustering

1. Introduction 2. Localized Event Detection �Temporal Keyword Extraction �Spatial Keyword Identification �Keyword Clustering �Cluster Scoring 3. System overview 4. Demonstration

Temporal Keyword Extraction �Extraction of words showing a bursty frequency in the current time

Temporal Keyword Extraction �Extraction of words showing a bursty frequency in the current time frame (these words are called keywords, Yc) �Given a set of words Wc from the tweets published during the recent time frame fc, extract a subset Yc ⊆ Wc which represents words likely to describe localized events. q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013)

Temporal Keyword Extraction (cont. ) �Use discrepancy paradigm to extract keywords based on their

Temporal Keyword Extraction (cont. ) �Use discrepancy paradigm to extract keywords based on their burstiness. �Assume: �during timeframe fc �u(w, c): normalized by the number of users publishing tweets containing word w q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013)

Temporal Keyword Extraction (cont. ) �In addition, �histw = (u(w, 1), u(w, 2), …,

Temporal Keyword Extraction (cont. ) �In addition, �histw = (u(w, 1), u(w, 2), …, u(w, m)) is a fixed historical sequence of usage values for w collected before the current time frame fc, such that m < c. �It is used when the system needs to describe the normal behavior of word w over previous time frames. q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013)

Temporal Keyword Extraction (cont. ) �The discrepancy paradigm measures the deviation between the word

Temporal Keyword Extraction (cont. ) �The discrepancy paradigm measures the deviation between the word usage value u(w, c) in the current time frame and an expected word usage baseline, b(w), which estimated from histw is drawn from Gaussian distribution with mean b(w). μ and deviation b(w). σ �Higher deviation, higher burstiness degree q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013)

Temporal Keyword Extraction (cont. ) �The burtinesss degree of a word w is the

Temporal Keyword Extraction (cont. ) �The burtinesss degree of a word w is the z-score defined: b_degree(w, c) : =( u(w, c)−b(w). μ)/b(w). σ �Choose words whose burstiness degree is larger than two standard deviations above the mean as keywords. �Keywords observed for the first time will have μ=0 and σ=0. q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013)

1. Introduction 2. Localized Event Detection �Temporal Keyword Extraction �Spatial Keyword Identification �Keyword Clustering

1. Introduction 2. Localized Event Detection �Temporal Keyword Extraction �Spatial Keyword Identification �Keyword Clustering �Cluster Scoring 3. System overview 4. Demonstration

Spacial Keyword Identification �Find keywords which are highly localized. �Only use georeferenced tweets. g

Spacial Keyword Identification �Find keywords which are highly localized. �Only use georeferenced tweets. g grid G q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013)

Spacial Keyword Identification �Only use georeferenced tweets. - Calculate Entropy H(Si) g - Discard

Spacial Keyword Identification �Only use georeferenced tweets. - Calculate Entropy H(Si) g - Discard all keywords with entropy larger than a threshold ρ. Why? - We’ll have Yc = set of filtered keywords g q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013)

Keyword Clustering �Each Si is a vector. �Clustering event keywords using their Si �Similarity

Keyword Clustering �Each Si is a vector. �Clustering event keywords using their Si �Similarity calculation: Cosine similarity q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013) Cosine Similarity, Wikipedia, http: //en. wikipedia. org/wiki/Cosine_similarity

Keyword Clustering - There is a distance threshold Т - If a new keyword

Keyword Clustering - There is a distance threshold Т - If a new keyword falls out of the threshold, it forms a new cluster itself. q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013) Saed Sayad, Kmeans clustering, http: //www. saedsayad. com/clustering_kmeans. htm

Cluster Scoring �To determine which clusters of keywords is more likely being referred to

Cluster Scoring �To determine which clusters of keywords is more likely being referred to localized events, filter out spurious clusters. �To score a cluster: 1. Score each keyword 2. Sum up all scores q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013)

Cluster Scoring � 1 2 3 q, Christian Sengstock, Michael Gertz: Even. Tweet: Online

Cluster Scoring � 1 2 3 q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013)

1. Introduction 2. Localized Event Detection �Temporal Keyword Extraction �Spatial Keyword Identification �Keyword Clustering

1. Introduction 2. Localized Event Detection �Temporal Keyword Extraction �Spatial Keyword Identification �Keyword Clustering �Cluster Scoring 3. System overview 4. Demonstration

System Overview q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from

System Overview q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013)

Demonstration q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter.

Demonstration q, Christian Sengstock, Michael Gertz: Even. Tweet: Online Localized Event Detection from Twitter. PVLDB 6(12): pp. 1326 -1329 (2013)