Indexing of Time Series by Major Minima and
Indexing of Time Series by Major Minima and Maxima Eugene Fink Kevin B. Pratt Harith S. Gandhi
Time series A time series is a sequence of real values measured at equal intervals. Example: 0, 3, 1, 2, 0, 1, 1, 3, 0, 2, 1, 4, 0, 1, 0 4 3 2 1 0
Results • Compression of a time series by extracting its major minima and maxima • Indexing of compressed time series • Retrieval of series similar to a given pattern • Experiments with stock and weather series
Outline • Compression • Indexing • Retrieval • Experiments
Compression We select major minima and maxima, along with the start point and end point, and discard the other points. We use a positive parameter R to control the compression rate.
Major minima A point a[m] in a[1. . n] is a major minimum if there are i and j, where i < m < j, such that: • a[m] is a minimum among a[i. . j], and • a[i] – a[m] R and a[j] – a[m] R. a[i] a[j] R R a[m]
Major maxima A point a[m] in a[1. . n] is a major maximum if there are i and j, where i < m < j, such that: • a[m] is a maximum among a[i. . j], and • a[m] – a[i] R and a[m] – a[j] R. a[m] R a[i] R a[j]
Compression procedure The procedure performs one pass through a given series. It takes linear time and constant memory. It can compress a live series without storing it in memory.
Outline • Compression • Indexing • Retrieval • Experiments
Indexing of series We index series in a database by their major inclines, which are upward and downward segments of the series.
Major inclines A segment a[1. . j] is a major upward incline if • a[i] is a major minimum; • a[j] is a major maximum; • for every m [i. . j], a[i] < a[m] < a[j]. The definition of a major downward incline is symmetric. a[j] a[i]
Identification of inclines The procedure performs two passes through a list of major minima and maxima.
Identification of inclines The procedure performs two passes through a list of major minima and maxima. Its time is linear in the number of inclines.
Indexing of inclines We index major inclines of series in a database by their lengths and heights. We use a range tree, which supports indexing of points by two coordinates. incline height length
Outline • Compression • Indexing • Retrieval • Experiments
Retrieval The procedure inputs a pattern series and searches for similar segments in a database. Example: Pattern Database 2 3 1
Retrieval The procedure inputs a pattern series and searches for similar segments in a database. Main steps: • Find the pattern’s inclines with the greatest height • Retrieve all segments that have similar inclines • Compare each of these segments with the pattern
Highest inclines First, the retrieval procedure identifies the important inclines in the pattern. , and selects the highest inclines. height 1 2 length 1 length 2
Candidate segments Second, the procedure retrieves segments with similar inclines from the database. We use the range tree to retrieve similar inclines. height / C height · C An incline is considered similar if • its height is between height / C and height · C; • its length is between length / D and length · D. incline length / C length · C
Similarity test Third, the procedure compares the retrieved segments with the pattern. , using a given similarity test.
Outline • Compression • Indexing • Retrieval • Experiments
Experiments We have tested a Visual-Basic implementation on a 2. 4 -GHz Pentium computer. Data sets: • Stock prices: 98 series, 60, 000 points • Air and sea temperatures: 136 series, 450, 000 points
Stock prices (60, 000 points) Search for 100 -point patterns The x-axes show the ranks of matches retrieved by the developed procedure, and the y-axes are the ranks assigned by a slow exhaustive search. 0 0 200 fast ranking C=D=5 time: 0. 05 sec 331 0 200 0 fast ranking C=D=2 time: 0. 02 sec perfect ranking 210 perfect ranking 400 0 151 0 fast ranking C = D = 1. 5 time: 0. 01 sec
Stock prices (60, 000 points) Search for 500 -point patterns The x-axes show the ranks of matches retrieved by the developed procedure, and the y-axes are the ranks assigned by a slow exhaustive search. 0 0 200 fast ranking C=D=5 time: 0. 31 sec 328 0 200 0 fast ranking C=D=2 time: 0. 12 sec perfect ranking 202 perfect ranking 400 0 167 0 fast ranking C = D = 1. 5 time: 0. 09 sec
Temperatures (450, 000 points) Search for 200 -point patterns The x-axes show the ranks of matches retrieved by the developed procedure, and the y-axes are the ranks assigned by a slow exhaustive search. 400 0 0 200 fast ranking C=D=5 time: 1. 18 sec 0 0 151 fast ranking C=D=2 time: 0. 27 sec perfect ranking 202 perfect ranking 400 0 0 82 fast ranking C = D = 1. 5 time: 0. 14 sec
Conclusions Main results: Compression and indexing of time series by major minima and maxima. Current work: Hierarchical indexing by importance levels of minima and maxima. 3 3 1 1 4 3 1 1 1
- Slides: 26