Time series analysis miniproject Danny Hendler hendlerdpost bgu
Time series analysis mini-project Danny Hendler hendlerd@post. bgu. ac. il Amir Rubin amirrub@post. bgu. ac. il
Agenda • Introduction to time series analysis • Euclidean distance • Dynamic time wrapping • Mini project TSA
Agenda • Introduction to time series analysis • Euclidean distance • Dynamic time wrapping • Mini project TSA
What is a time series? q A set of measurements arranged chronologically Ø Each representing measured statistic in a specific time unit q Examples: Ø Ø Ø Monthly revenues Daily temperatures Number of flu cases Stock prices …
Why is time series important? Identify trends Identify abnormal activity Linear regression tools Compare two time series …
Agenda • Introduction to time series analysis • Euclidean distance • Dynamic time wrapping • Mini project TSA
Euclidean distance shifting
Euclidean distance - issues 6 5 4 3 2 1 0 0 2 4 shifting 6 8 10 V 1 12 V 2 14 16 18 20
Euclidean distance – possible solution 6 5 4 3 2 1 -9 -7 -5 shifting -3 -1 0 V 1 Bring center of mass to 0 1 V 2 3 5 7 9
Euclidean distance - issues
Euclidean distance - issues We would like to enable 1. Shifting 2. Compression
Euclidean distance - issues shifting
Euclidean distance - issues compression
Agenda • Introduction to time series analysis • Euclidean distance • Dynamic time wrapping • Mini project TSA
Dynamic Time Warping Given two time series A and B of length n and m:
Dynamic Time Warping
DTW – time complexity • Computing DTW(m, n) takes ϴ(mn) time Ø Can be improved using several heuristics
DTW heuristics – locality constraint • Often unlikely for ai and bj to be matched if |i-j|>w Only compute w-width diagonal
Locality constraint – problematic scenario 6 5 4 3 2 shifting 1 0 0 2 4 6 8 10 V 1 12 V 2 14 16 18 20
Agenda • Introduction to time series analysis • Euclidean distance • Dynamic time wrapping • Mini project TSA
TSA mini projects Build time series per file Count machines? Number of download? Window size - day? Week? Step size – hour? Day? • • Malware downloads per day Legitimate files downloads per day 70 Number of downloads 60 50 40 30 20 10 0 20 18 16 14 12 10 8 6 4 2 0 1 2 3 4 5 Malware 1 6 7 8 Day number Malware 2 9 10 Malware 3 11 12 13 14 1 2 3 4 5 Legitimate 1 6 7 8 Day number Legitimate 2 9 10 11 Legitimate 3 12 13 14
TSA mini projects • • • Compute DTW distance to any two files Can be expensive! Maybe only distance from malicious/clean files? Look at the nearest k files Compute number of malwares among them Compute the average/median/min/max distance k=10 k=5 k=1
- Slides: 22