Time Series Algorithm Tutorial Adopted from Andrew Moores

  • Slides: 78
Download presentation
Time Series Algorithm Tutorial Adopted from Andrew Moore’s slides RODS: http: //www. health. pitt.

Time Series Algorithm Tutorial Adopted from Andrew Moore’s slides RODS: http: //www. health. pitt. edu/rods Auton Lab: http: //www. autonlab. org Copyright © 2002, 2003, 2004 Andrew Moore Biosurveillance Detection Algorithms: Slide 1

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 2

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 2

Signal The Basic Task: Analyze a time series data stream to find outbreaks without

Signal The Basic Task: Analyze a time series data stream to find outbreaks without sounding too many false alarms Time Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 3

Many Methods! Method Time-weighted averaging Serfling ARIMA SARIMA + External Factors Univariate HMM Kalman

Many Methods! Method Time-weighted averaging Serfling ARIMA SARIMA + External Factors Univariate HMM Kalman Filter Recursive Least Squares Support Vector Machine Neural Nets Randomization Spatial Scan Statistics Bayesian Networks Contingency Tables Scalar Outlier (SQC) Multivariate Anomalies Change-point statistics FDR Tests WSARE (Recent patterns) PANDA (Causal Model) FLUMOD (space/Time HMM) Has Pitt/CMU tried it? Yes Yes Yes Yes Yes Tried but little used Yes Yes Yes Tried and used Under development Multivariate signal tracking? Spatial ? Yes Yes Yes Yes (w/ Howard Burkom) Yes Yes Yes Yes Details of these methods and bibliography available from “Summary of Biosurveillance-relevant statistical and data mining technologies” by Moore, Cooper, Tsui and Wagner. Downloadable (PDF format) from www. cs. cmu. edu/~awm/biosurv-methods. pdf Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 4

What you’ll learn about • Noticing events in bioevent time series • Tracking many

What you’ll learn about • Noticing events in bioevent time series • Tracking many series at once Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 5

What you’ll learn about These are all • Noticing events in bioevent time series

What you’ll learn about These are all • Noticing events in bioevent time series • Tracking many series at once Copyright © 2002, 2003, Andrew Moore powerful statistical methods, which means they all have to have one thing in common… Biosurveillance Detection Algorithms: Slide 6

What you’ll learn about • Noticing events in bioevent time series • Tracking many

What you’ll learn about • Noticing events in bioevent time series • Tracking many series at once Copyright © 2002, 2003, Andrew Moore These are all powerful statistical methods, which means they all have to have one thing in common… Boring Names. Biosurveillance Detection Algorithms: Slide 7

What you’ll learn about • Noticing events in bioevent time series • Tracking many

What you’ll learn about • Noticing events in bioevent time series • Tracking many series at once These are all powerful statistical methods, which means they all have to have one thing in common… Boring Names. Univariate Anomaly Detection Multivariate Anomaly Detection Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 8

What you’ll learn about • Noticing events in bioevent time series • Tracking many

What you’ll learn about • Noticing events in bioevent time series • Tracking many series at once Univariate Anomaly Detection Multivariate Anomaly Detection Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 9

Signal Univariate Time Series Time Example Signals: • • • Copyright © 2002, 2003,

Signal Univariate Time Series Time Example Signals: • • • Copyright © 2002, 2003, Andrew Moore Number of ED visits today Number of ED visits this hour Number of Respiratory Cases Today School absenteeism today Nyquil Sales today Biosurveillance Detection Algorithms: Slide 10

(When) is there an anomaly? Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms:

(When) is there an anomaly? Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 11

(When) is there an anomaly? This is a time series of counts of primary-physician

(When) is there an anomaly? This is a time series of counts of primary-physician visits in data from Norfolk in December 2001. I added a fake outbreak, starting at a certain date. Can you guess the start date? Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 12

(When) is there an anomaly? This is a time series of counts of primary-physician

(When) is there an anomaly? This is a time series of counts of primary-physician visits in data from Norfolk in December 2001. I added a fake outbreak, starting at a certain date. Can you guess when? Here (much too high for a Friday) (injected outbreak) Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 13

Signal An easy case Time Dealt with by Statistical Quality Control Record the mean

Signal An easy case Time Dealt with by Statistical Quality Control Record the mean and standard deviation up to the current time. Signal an alarm if we go outside 3 sigmas Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 14

An easy case: Control Charts Signal Upper Safe Range Mean Time Dealt with by

An easy case: Control Charts Signal Upper Safe Range Mean Time Dealt with by Statistical Quality Control Record the mean and standard deviation up to the current time. Signal an alarm if we go outside 3 sigmas Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 15

Control Charts on the Norfolk Data Alarm Level (injected outbreak) Copyright © 2002, 2003,

Control Charts on the Norfolk Data Alarm Level (injected outbreak) Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 16

Control Charts on the Norfolk Data Alarm Level (injected outbreak) Copyright © 2002, 2003,

Control Charts on the Norfolk Data Alarm Level (injected outbreak) Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 17

Control Charts on the Norfolk Data Alarm Level Copyright © 2002, 2003, Andrew Moore

Control Charts on the Norfolk Data Alarm Level Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 18

Looking at changes from yesterday Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms:

Looking at changes from yesterday Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 19

Looking at changes from yesterday Alarm Level Copyright © 2002, 2003, Andrew Moore Biosurveillance

Looking at changes from yesterday Alarm Level Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 20

Looking at changes from yesterday Alarm Level Copyright © 2002, 2003, Andrew Moore Biosurveillance

Looking at changes from yesterday Alarm Level Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 21

We need a happy medium: Control Chart: Too insensitive to recent changes Copyright ©

We need a happy medium: Control Chart: Too insensitive to recent changes Copyright © 2002, 2003, Andrew Moore Change from yesterday: Too sensitive to recent changes Biosurveillance Detection Algorithms: Slide 22

Moving Average Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 23

Moving Average Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 23

Moving Average Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 24

Moving Average Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 24

Moving Average Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 25

Moving Average Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 25

Moving Average e w n a c ? s w i o h h

Moving Average e w n a c ? s w i o h h out t t u B ab. r e e t t v i e t b ntita s k Loo e qua b Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 26

Algorithm Performance Allowing one False Alarm per TWO weeks… t ec et k d

Algorithm Performance Allowing one False Alarm per TWO weeks… t ec et k d ac to tt ys p a Da ram d a of te n tec tio de ac s Fr ike t sp etec k d ac to tt ys p a Da am r d a of te n tec tio de ac s Fr ike Copyright © 2002, 2003, Andrew Moore sp Allowing one False Alarm per SIX weeks… Biosurveillance Detection Algorithms: Slide 27

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 28

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 28

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 29

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 29

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 30

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 30

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 31

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 31

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 32

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 32

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 33

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 33

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 34

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 34

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 35

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 35

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 36

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 36

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 37

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 37

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 38

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 38

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 39

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 39

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 40

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 40

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 41

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 41

Algorithm Performance Allowing one False Alarm per TWO weeks… t ec et k d

Algorithm Performance Allowing one False Alarm per TWO weeks… t ec et k d ac to tt ys p a Da ram d a of te n tec tio de ac s Fr ike t sp etec k d ac to tt ys p a Da am r d a of te n tec tio de ac s Fr ike sp Copyright © 2002, 2003, Andrew Moore Allowing one False Alarm per SIX weeks… Biosurveillance Detection Algorithms: Slide 42

Algorithm Performance Allowing one False Alarm per TWO weeks… t ec et k d

Algorithm Performance Allowing one False Alarm per TWO weeks… t ec et k d ac to tt ys p a Da ram d a of te n tec tio de ac s Fr ike t sp etec k d ac to tt ys p a Da am r d a of te n tec tio de ac s Fr ike sp Copyright © 2002, 2003, Andrew Moore Allowing one False Alarm per SIX weeks… Biosurveillance Detection Algorithms: Slide 43

Algorithm Performance Allowing one False Alarm per TWO weeks… t ec et k d

Algorithm Performance Allowing one False Alarm per TWO weeks… t ec et k d ac to tt ys p a Da ram d a of te n tec tio de ac s Fr ike t sp etec k d ac to tt ys p a Da am r d a of te n tec tio de ac s Fr ike sp Copyright © 2002, 2003, Andrew Moore Allowing one False Alarm per SIX weeks… Biosurveillance Detection Algorithms: Slide 44

Signal Seasonal Effects Time Fit a periodic function (e. g. sine wave) to previous

Signal Seasonal Effects Time Fit a periodic function (e. g. sine wave) to previous data. Predict today’s signal and 3 -sigma confidence intervals. Signal an alarm if we’re off. Reduces False alarms from Natural outbreaks. Different times of year deserve different thresholds. Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 45

Algorithm Performance Allowing one False Alarm per TWO weeks… t ec et k d

Algorithm Performance Allowing one False Alarm per TWO weeks… t ec et k d ac to tt ys p a Da ram d a of te n tec tio de ac s Fr ike t sp etec k d ac to tt ys p a Da am r d a of te n tec tio de ac s Fr ike sp Copyright © 2002, 2003, Andrew Moore Allowing one False Alarm per SIX weeks… Biosurveillance Detection Algorithms: Slide 46

Day-of-week effects Fit a day-of-week component E[Signal] = a + deltaday E. G: deltamon=

Day-of-week effects Fit a day-of-week component E[Signal] = a + deltaday E. G: deltamon= +5. 42, deltatue= +2. 20, deltawed= +3. 33, deltathu= +3. 10, deltafri= +4. 02, deltasat= -12. 2, deltasun= -23. 42 A simple form of ANOVA Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 47

Regression using Hours-in-day & Is. Monday Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection

Regression using Hours-in-day & Is. Monday Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 48

Regression using Hours-in-day & Is. Monday Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection

Regression using Hours-in-day & Is. Monday Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 49

Algorithm Performance Allowing one False Alarm per TWO weeks… t ec et k d

Algorithm Performance Allowing one False Alarm per TWO weeks… t ec et k d ac to tt ys p a Da ram d a of te n tec tio de ac s Fr ike t sp etec k d ac to tt ys p a Da am r d a of te n tec tio de ac s Fr ike sp Copyright © 2002, 2003, Andrew Moore Allowing one False Alarm per SIX weeks… Biosurveillance Detection Algorithms: Slide 50

Regression using Mon-Tue Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 51

Regression using Mon-Tue Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 51

Algorithm Performance Allowing one False Alarm per TWO weeks… t ec et k d

Algorithm Performance Allowing one False Alarm per TWO weeks… t ec et k d ac to tt ys p a Da ram d a of te n tec tio de ac s Fr ike t sp etec k d ac to tt ys p a Da am r d a of te n tec tio de ac s Fr ike sp Copyright © 2002, 2003, Andrew Moore Allowing one False Alarm per SIX weeks… Biosurveillance Detection Algorithms: Slide 52

CUSUM • CUmulative SUM Statistics • Keep a running sum of “surprises”: a sum

CUSUM • CUmulative SUM Statistics • Keep a running sum of “surprises”: a sum of excesses each day over the prediction • When this sum exceeds threshold, signal alarm and reset sum Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 53

CUSUM Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 54

CUSUM Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 54

CUSUM Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 55

CUSUM Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 55

Algorithm Performance Allowing one False Alarm per TWO weeks… t ec et k d

Algorithm Performance Allowing one False Alarm per TWO weeks… t ec et k d ac to tt ys p a Da ram d a of te n tec tio de ac s Fr ike t sp etec k d ac to tt ys p a Da am r d a of te n tec tio de ac s Fr ike sp Copyright © 2002, 2003, Andrew Moore Allowing one False Alarm per SIX weeks… Biosurveillance Detection Algorithms: Slide 56

The Sickness/Availability Model Counts = sickness * availability Plot this Sickness = counts /

The Sickness/Availability Model Counts = sickness * availability Plot this Sickness = counts / availability Sick people may seek care more often on certain days due to availability of medical services or time in their schedules, so adjust for that phenomenon Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 57

The Sickness/Availability Model Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 58

The Sickness/Availability Model Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 58

The Sickness/Availability Model Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 59

The Sickness/Availability Model Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 59

The Sickness/Availability Model Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 60

The Sickness/Availability Model Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 60

The Sickness/Availability Model Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 61

The Sickness/Availability Model Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 61

The Sickness/Availability Model Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 62

The Sickness/Availability Model Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 62

The Sickness/Availability Model Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 63

The Sickness/Availability Model Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 63

The Sickness/Availability Model Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 64

The Sickness/Availability Model Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 64

Algorithm Performance Allowing one False Alarm per TWO weeks… t ec et k d

Algorithm Performance Allowing one False Alarm per TWO weeks… t ec et k d ac to tt ys p a Da ram d a of te n tec tio de ac s Fr ike t sp etec k d ac to tt ys p a Da am r d a of te n tec tio de ac s Fr ike sp Copyright © 2002, 2003, Andrew Moore Allowing one False Alarm per SIX weeks… Biosurveillance Detection Algorithms: Slide 65

Other state-of-the-art methods • • • Wavelets Change-point detection Kalman filters Hidden Markov Models

Other state-of-the-art methods • • • Wavelets Change-point detection Kalman filters Hidden Markov Models Many others Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 72

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 73

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 73

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 74

Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 74

A generalized anomaly detector model based on time series algorithms For example Histori cal

A generalized anomaly detector model based on time series algorithms For example Histori cal Avera ge 1 0 s. Thld 7 5 2021/3/2 Biosurveillance Detection Algorithms: Slide 75

Open-sourced Libraries for Time Series Algorithms • • 2017/02 Facebook Prophet (R/Python) 2016 Yahoo!

Open-sourced Libraries for Time Series Algorithms • • 2017/02 Facebook Prophet (R/Python) 2016 Yahoo! egads (Java) 2016 Twitter anomaly detection (R) 2015 Netflix Surus (Pig,based on PCA) 2013 Etsy skyline (python) 2013 Numenta Nu. PIC (python,based on HTM) 1997 RRDtool HWPREDICT。(C,based on holt -winters) Biosurveillance Detection Algorithms: Slide 76

What you’ll learn about • Noticing events in bioevent time series • Tracking many

What you’ll learn about • Noticing events in bioevent time series • Tracking many series at once Univariate Anomaly Detection Multivariate Anomaly Detection Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 77

Multiple Signals Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 78

Multiple Signals Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 78

Multivariate Signals (relevant to inhalational diseases) Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection

Multivariate Signals (relevant to inhalational diseases) Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 79

Multi Source Signals Lab Flu Web. MD School Cough& Cold Throat Resp Viral Death

Multi Source Signals Lab Flu Web. MD School Cough& Cold Throat Resp Viral Death weeks Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 80

What if you’ve got multiple signals? Red: Cough Sales Signal Blue: ED Respiratory Visits

What if you’ve got multiple signals? Red: Cough Sales Signal Blue: ED Respiratory Visits Time Idea One: Simply treat it as two separate alarm-fromsignal problems. …Question: why might that not be the best we can do? Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 81

Another View Red: Cough Sales Signal Blue: ED Respiratory Visits Cough Sales Question: why

Another View Red: Cough Sales Signal Blue: ED Respiratory Visits Cough Sales Question: why might that not be the best we can do? ED Respiratory Visits Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 82

Another View Red: Cough Sales Signal Blue: ED Respiratory Visits This should be an

Another View Red: Cough Sales Signal Blue: ED Respiratory Visits This should be an anomaly Cough Sales Question: why might that not be the best we can do? ED Respiratory Visits Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 83

N-dimensional Gaussian Red: Cough Sales Signal Blue: ED Respiratory Visits One Sigma Good Practical

N-dimensional Gaussian Red: Cough Sales Signal Blue: ED Respiratory Visits One Sigma Good Practical Idea: Cough Sales Model the joint with a Gaussian 2 Sigma ED Respiratory Visits Copyright © 2002, 2003, Andrew Moore Biosurveillance Detection Algorithms: Slide 84