Event Stream Processing with OutofOrder Data Arrival Ming

  • Slides: 63
Download presentation
Event Stream Processing with Out-of-Order Data Arrival Ming Li and Mo Liu Department of

Event Stream Processing with Out-of-Order Data Arrival Ming Li and Mo Liu Department of Computer Science Worcester Polytechnic Institute Worcester Massachusetts CS 525 Project Final Presentation 12. 14 2006

Outline n n n n Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Implementation

Outline n n n n Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Implementation Experiment Conclusion Related Work

Outline n n n n 1. 2. 3. 4. Event Stream Processing SASE System

Outline n n n n 1. 2. 3. 4. Event Stream Processing SASE System Limitation of SASE Goal and Contribution Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Implementation Experiment Conclusion Related Work

Introduction: Event Stream Processing n n Raising interest on the database community Wild-range and

Introduction: Event Stream Processing n n Raising interest on the database community Wild-range and growing applications Sensor isn’t Moving: (Shelf, !Checkout. Counter, Exit) Retail Management Sensor is Moving: (police vehicle, ambulance, Reporter Vehicle) Traffic Control

Introduction: SASE System n Event Stream Processing Engine ¨ Stream engine specific for even

Introduction: SASE System n Event Stream Processing Engine ¨ Stream engine specific for even stream query: generic for detecting and extracting expected pattern sequence ¨ Performance gain compared to stream system using joins to handle event sequence query SASE Approach Telegraph. CQ Approach

Introduction: SASE System (Cont. ) a 3 b 6 d 10 …… TF: sequence

Introduction: SASE System (Cont. ) a 3 b 6 d 10 …… TF: sequence to composite event NG: !C (B. time<C. time<D. time a 3 b 6 d 10 a 7 b 11 d 15 …… WD in SC: D. time – A. time < 10 secs SC (A, B, D) EVENT WITHIN SSC SEQ(A, B, !C, D) 10 seconds WD in SS: W = 10 a 3 b 6 d 15 a 3 b 11 d 15 will not be selected SS (A, B, D) b a c b a Event Stream b d f c 1 3 5 6 7 10 11 12 13 d a c 15 16 17 a… 18… Timestamp

Introduction: SASE System (Cont. ) <a(2) b(2) d(2)> …… a(2) b(2) d(2) …… TF:

Introduction: SASE System (Cont. ) <a(2) b(2) d(2)> …… a(2) b(2) d(2) …… TF: sequence to composite event NG: !C (B. time<C. time<D. time Λ B. attr_1 = C. attr_1) a(2) b(2) d(2) a(3) b(3) d(3) …… SL: [attr] a(2) b(2) d(2) a(3) b(3) d(3) …… WD in SC: D. time – A. time < 10 secs SC (A, B, D) EVENT WHERE WITHIN SSC SEQ(A, B, !C, D) [attr_1] 10 seconds WD in SS: W = 10 SS (A, B, D) Event Stream b(1) a(2) c(2) b(2) a(3) d(2) b(3) f(3) c(3) d(3) a(4) c(4) a(5)… 1 3 5 6 7 10 11 12 13 15 16 17 18… Timestamp

Introduction: Limitation in SASE n Total Order Assumption in event arrivals ¨ Order in

Introduction: Limitation in SASE n Total Order Assumption in event arrivals ¨ Order in which the events are received by the query system is the same as their timestamp order ¨ By this assumption, “later arrival” means “larger timestamp” ¨ Example n n e 1. timestamp = 5: 15 pm e 1. received_time = 5: 17 pm e 2. timestamp = 5: 19 pm e 2. received_time = 5: 20 pm e 2 is received later than e 1 e 2’s timestamp is larger than e 1 In the Case of Out-of-Order Event Arrival ¨ Missing result ¨ Spurious result ¨ Unbound memory requirement

Introduction: Goal and Contributions n Goal ¨ Proposing solution to handle the sequence query

Introduction: Goal and Contributions n Goal ¨ Proposing solution to handle the sequence query processing with out-of-order event arrival n Contributions ¨ Study the problem with OOO event arrival ¨ Solution framework on all the problems n n n Solution on Sequence Scan Solution on Negation Solution on Window in SS

Outline n n n n 1. 2. 3. 4. Event, Event Stream and Query

Outline n n n n 1. 2. 3. 4. Event, Event Stream and Query SASE Evaluation - SSC SASE Evaluation - Negation SASE Evaluation - Window Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Implementation Experiment Conclusion Related Work

Preliminary: Event, Event Stream and Event Sequence Query Language n Event and Event Stream

Preliminary: Event, Event Stream and Event Sequence Query Language n Event and Event Stream ¨ ¨ ¨ n An event is defined to be an instantaneous, atomic (happens completely or not at all) occurrence of interest at a point in time Each event, denoted by a lower case letter (e. g. , “a”), consists of the name of its type, denoted by a upper case letter and a set of values corresponding to the attributes in the type. Each Event is with a timestamp under the total order assumption Event stream: containing different event types Example: a(attr_1 = 2, timestamp = 4), c(attr_1 = 1, timestamp = 5)… SASE Query Language ¨ EVENT <event pattern> [WHERE <qualification>] [WITHIN <window>] Example: EVENT WHERE WITHIN SEQ(A, B, !C, D) [attr_1] 10 seconds

Preliminary: SASE Evaluation – SSC n n SSC: SS (Sequence Scan) and SC (Sequence

Preliminary: SASE Evaluation – SSC n n SSC: SS (Sequence Scan) and SC (Sequence Construction) NFA with AIS (Active Instance Stack) RIP (most Recent Instance in Previous stack) field Example * A 0 EVENT SEQ(A, B, D) WITHIN 10 Seconds * B 1 D 2 3 [] a 7 [] a 16 [a 3] b 6 [a 7] b 11 [b 6] d 10 [b 11] d 15 S 1 S 2 S 3 a 3 b 6 d 10 a 3 b 6 d 15 a 3 b 11 d 15 a 7 b 11 d 15 b a c b a d b f c d a 1 3 5 6 7 10 11 12 13 15 16 17 c a… 18… Timestamp

Preliminary: SASE Evaluation – NG n n Negation (NG) Example EVENT SEQ(A, B, !C,

Preliminary: SASE Evaluation – NG n n Negation (NG) Example EVENT SEQ(A, B, !C, D) WITHIN 10 Seconds b a c b a d b f c d a 1 3 5 6 7 10 11 12 13 15 16 17 a 3 b 6 d 10 [3, 10] √ a 7 b 11 d 15 [10, 15] Χ c a… 18… Timestamp

Preliminary: SASE Evaluation – Purge n Purge ¨ ¨ n Purge in SSC Purge

Preliminary: SASE Evaluation – Purge n Purge ¨ ¨ n Purge in SSC Purge in NG Example * A 0 * B 1 PG in SS: () a 3 () a 7 You see d 15 Purge a 3 and so on The similar mechanism, You clean c 5 and so on EVENT SEQ(A, B, D) WITHIN 10 Seconds D 2 3 a 3 b 6 d 10 (a 3) b 6 (b 6) d 10 (a 7) b 11 (b 11) d 15 S 1 a 3 b 6 d 15 a 3 b 11 d 15 a 7 b 11 d 15 S 3 S 2 b a c b a d b f c d a 1 3 5 6 7 10 11 12 13 15 16 17 c a… 18… Timestamp WD

Outline n n n n 1. 2. 3. 4. Sequence Scan Negation Window in

Outline n n n n 1. 2. 3. 4. Sequence Scan Negation Window in Sequence Scan Problem Analysis Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Implementation Experiment Conclusion Related Work

Problem with OOO: Sequence Scan SS Missing Result EVENT SEQ(A, B, D) WITHIN 10

Problem with OOO: Sequence Scan SS Missing Result EVENT SEQ(A, B, D) WITHIN 10 Seconds a b d a c b a d b f c d a 0 1 2 3 5 6 7 10 11 12 13 15 16 17 * 0 A 1 () a 3 () a 7 * B 2 D c a… 18… Arrival Order Produced Result Correct Result a 3 b 6 d 10 a 7 b 11 d 15 a 0 b 1 d 2 a 3 b 6 d 10 a 7 b 11 d 15 3 (a 3) b 6 (b 6) d 10 (a 7) b 11 (b 11) d 15 Missing!

Problem with OOO: Negation NG Incorrect Result EVENT SEQ(A, B, !C, D) WITHIN 10

Problem with OOO: Negation NG Incorrect Result EVENT SEQ(A, B, !C, D) WITHIN 10 Seconds b a c d 1 3 5 6 7 9 10 * 0 A 1 () a 3 * B 2 (a 3) b 6 D Arrival Order Produced Result 3 (b 6) d 10 a 3 b 6 d 10 a 7 b 11 d 15 Incorrect!

Problem with OOO: Purge in SSC Purge in SS You see d 15 then

Problem with OOO: Purge in SSC Purge in SS You see d 15 then purge a 3 and so on After that, OOO d 8 comes Missing Result! Similar case of purging then making incorrect result! (1) You cannot purge any data if you want to avoid missing results or creating spurious result (2) Unbounded buffer requirement in that case EVENT SEQ(A, B, D) WITHIN 10 Seconds * A 0 * B 1 () a 3 () a 7 2 D 3 a 3 b 6 d 10 (a 3) b 6 (b 6) d 10 (a 7) b 11 (b 11) d 15 S 1 a 3 b 6 d 15 a 3 b 11 d 15 a 7 b 11 d 15 S 3 S 2 b a c b a d d b f c d a 1 3 5 6 7 8 10 11 12 13 15 16 17 c a… 18… Timestamp If precise query result is required, and memory resources is limited, WD in SS would not be sufficient for handling Out-of-order event arrival!

Problem with OOO: Purge in NG

Problem with OOO: Purge in NG

Outline n n n n 1. Sequence Scan 2. Window in Sequence Scan 3.

Outline n n n n 1. Sequence Scan 2. Window in Sequence Scan 3. Negation (skipped) Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Implementation Experiment Conclusion Related Work

Solution in SS: Using Sort Semantic Initially, every stack is active Search for proper

Solution in SS: Using Sort Semantic Initially, every stack is active Search for proper place in the stack for a new event RIP pointer might be reset n n n EVENT SEQ(A, B, D) WITHIN 10 Seconds (both coming after d 15) a b d a c b a d b f c d a 0 1 2 3 5 6 7 10 11 12 13 15 16 17 * 0 () a 3 (a 3) b 1 A 1 () a 3 () a 7 * B 2 D 3 (a 3) b 1 (a 3) b 6 (b 6) d 10 (a 7) b 11 (b 11) d 15 c a… 18… Arrival Order Inserting the a 0 / d 2 (OOO) into the right spot and reset RIP () a 3 () a 7 (a 3) b 1 (a 3) b 6 (b 6) d 10 (a 7) b 11 (b 11) d 15

Solution in NG / PSSC / PNG: Possible Solutions n Using K-Slack ¨ Pros:

Solution in NG / PSSC / PNG: Possible Solutions n Using K-Slack ¨ Pros: simple ¨ Cons: big assumption about the input stream n Punctuation ¨ Pros: general and more optimization opportunities ¨ Cons: might have overhead

Solution in NG / PSSC / PNG by K-Slack

Solution in NG / PSSC / PNG by K-Slack

Solution in NG / PSSC / PNG by Punctuation

Solution in NG / PSSC / PNG by Punctuation

Punctuation: Range-Out-of-Order-Free Punctuation n Range-Out-of-Order-Free (Roof) Punctuation P<E, t> time_stamp t ¨ Event type

Punctuation: Range-Out-of-Order-Free Punctuation n Range-Out-of-Order-Free (Roof) Punctuation P<E, t> time_stamp t ¨ Event type E ¨ n Property Total Order in-order events (simply we can just use the timestamp and don’t care the received time) ¨ No contradiction within the punctuations: getting stronger and stronger ¨ n Example No More out-dated D events will come d av D_p Time Stamp

Punct in PSSC: Back 2 Front Singleton Purge distance (a, D_p) > w? ?

Punct in PSSC: Back 2 Front Singleton Purge distance (a, D_p) > w? ? (D_p. Timestam - a. Timestamp > w) EVENT SEQ(A, B, D, E) WITHIN 10 Seconds w a d w If Yes, any d’s inside back window? D_p

Punct in PSSC: Front 2 Back Singleton Purge EVENT SEQ(A, B, D, E) WITHIN

Punct in PSSC: Front 2 Back Singleton Purge EVENT SEQ(A, B, D, E) WITHIN 10 Seconds If Yes, any a’s inside the front window? w a d d appears infront of A_p? (d. Timestam < A_p. Timestamp ? ) A_p

Punct in PSSC: Lazy Purge Algorithm EVENT SEQ (E 1, E 2, …, En)

Punct in PSSC: Lazy Purge Algorithm EVENT SEQ (E 1, E 2, …, En) WITHIN 10 Seconds e e P e P P Purging event sequence Algorithm: Lazy Purging Receiving event e or roof punctuation rp: • Event e: updating the stored event sequence and periodically doing ALG purging_event_seq (ROOF_Set, stored event sequence) (2) ROOF rp: updating the ROOF_Set and periodically doing ALG purging_event_seq (ROOF_Set, stored event sequence)

Punct in PSSC: Lazy Purge Algorithm (Cont. ) EVENT SEQ (E 1, E 2,

Punct in PSSC: Lazy Purge Algorithm (Cont. ) EVENT SEQ (E 1, E 2, …, En) WITHIN 10 Seconds e P P Algorithm: purging_single_event A single event e and a ROOF_Set: ALG purging_single_event (ROOF_Set, stored event sequence) // sequential checking + dependency checking

Punct in PSSC: Lazy Purge Algorithm (Cont. ) EVENT SEQ (E 1, E 2,

Punct in PSSC: Lazy Purge Algorithm (Cont. ) EVENT SEQ (E 1, E 2, …, En) WITHIN 10 Seconds e e P e P P Algorithm: purging_event_sequence Event sequence and roof punctuation rp: ALG purging_event_seq (ROOF_Set, stored event sequence) // by the event order, do purging_single_seq

Punct in PSSC: Aggressive Purge Algorithm EVENT SEQ (E 1, E 2, …, En)

Punct in PSSC: Aggressive Purge Algorithm EVENT SEQ (E 1, E 2, …, En) WITHIN 10 Seconds e e P e P P Can drop event directly Algorithm: Aggressive Purging Receiving event e or roof punctuation rp: • Event e: updating the stored event sequence and periodically doing ALG purging_single_event (ROOF_Set, stored event sequence) (2) ROOF rp: updating the ROOF_Set and periodically doing ALG purging_signle_event (ROOF_Set, stored event sequence) Purging old sequence

Punct in PSSC: Optimization (Cont. ) n Keeping the purging complete, but smarter ¨

Punct in PSSC: Optimization (Cont. ) n Keeping the purging complete, but smarter ¨ Under construction n Making the purging “incomplete” ¨ Singleton purging ¨ Total purging ¨ Density-based purging

Punct in PSSC: Optimization Singleton Batch Purging 1 Every A and B event falling

Punct in PSSC: Optimization Singleton Batch Purging 1 Every A and B event falling in this range can be purged EVENT SEQ(A, B, D, E) WITHIN 10 Seconds w d Furriest D event outside the window D_p

Punct in PSSC: Optimization Singleton Batch Purging 2 Every D and E event falling

Punct in PSSC: Optimization Singleton Batch Purging 2 Every D and E event falling in this range can be purged EVENT SEQ(A, B, D, E) WITHIN 10 Seconds b Furriest B event outside the window B_p

Outline n n n n Introduction 1. Sequence Scan 2. Window in Sequence Scan

Outline n n n n Introduction 1. Sequence Scan 2. Window in Sequence Scan Preliminary Problem with Out-of-Order Event Arrival Solution Implementation Experiment Conclusion Related Work

Implementation: Design n Basic Event Processing ¨ Event and event generator, query plan and

Implementation: Design n Basic Event Processing ¨ Event and event generator, query plan and plan generator, basic operators n Out-of-order Handler ¨ New functionalities of SS: two modes – append and sort (for further possible chance of using punctuation) ¨ New functionalities of the NG and WD operator

Implementation: Design (Cont. ) Query Plan Generator Query Plan NG PNG tuples Window tuples

Implementation: Design (Cont. ) Query Plan Generator Query Plan NG PNG tuples Window tuples SSC SC PSSC: window W NFA Stack maintain state and pointers Event stream generator SS

Existing Class n n n Stack (To store events of the same type) Event

Existing Class n n n Stack (To store events of the same type) Event stream generator (in order/out of order) Window Operator Negation buffer include: negationinstances negationoutput SSCoutput negationtypes need to add Negation Operator: NG& Negation buffer purge (PNG with k-slack method)

To revise SSC and SSC Purge(PSSC) are both in the event class now. So

To revise SSC and SSC Purge(PSSC) are both in the event class now. So we need to put them separately as two/three operators. n Add Query Plan Generator Given a query, we need to find the SSC stack types, negation instances n In the design graph, each output of one operator should be the input of another one. n

Outline n n n n 1. Design and Setup Introduction 2. Result Analysis Preliminary

Outline n n n n 1. Design and Setup Introduction 2. Result Analysis Preliminary Problem with Out-of-Order Event Arrival Solution Implementation Experiment Conclusion Related Work

Experiment: Design and Setup

Experiment: Design and Setup

Experiment: Result Analysis

Experiment: Result Analysis

Outline n n n n Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Implementation

Outline n n n n Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Implementation Experiment Conclusion Related Work

Conclusion n We study the problem with OOO event arrival n We propose a

Conclusion n We study the problem with OOO event arrival n We propose a solution framework on handling sequence query processing with out-of-order data arrival

Outline n n n n Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Implementation

Outline n n n n Introduction Preliminary Problem with Out-of-Order Event Arrival Solution Implementation Experiment Conclusion Related Work

Related Work n Event stream process (SASE system) n Regular stream processing system (Telegraph.

Related Work n Event stream process (SASE system) n Regular stream processing system (Telegraph. CQ, Eddy, etc. ) n Basic event processing (Amit system) n Luping Ding’s comprehensive exam talk n K-slack and punctuation

Happy Holiday Season!

Happy Holiday Season!

TF: sequence to composite event ( ts: timestamp ) WD: D. ts – A.

TF: sequence to composite event ( ts: timestamp ) WD: D. ts – A. ts < 10 secs SC: (A, B, D) SSC SS: (A, B, D) PSSC: W = 10 secs Input Event Stream Q: EVENT WITHIN SEQ (A, B, D) 10 seconds

* * A 0 B 1 D 2 3 (a) automata [ ] a

* * A 0 B 1 D 2 3 (a) automata [ ] a 3 [a 3] b 6 [b 6] d 10 [ ] a 7 [a 7] b 11 [b 11] d 15 S 1 S 2 S 3 (b) SSC using Active Instance Stacks b a c b a d b f c d 1 3 5 6 7 10 11 12 13 15 f… 16 17…Receiving Order f (c) Input Event Stream SSC TF WD a 3 b 6 d 10 a 3 b 6 d 15 a 3 b 11 d 15 a 7 b 11 d 15 a 3 b 6 d 10 a 7 b 11 d 15 <a 3 b 6 d 10> <a 7 b 11 d 15> Tuples Holding Event Sequences (d) Producing Result Tuples

Incomplete Retrieval & Event Misplacement TF: sequence to composite event WD: Em. ts –

Incomplete Retrieval & Event Misplacement TF: sequence to composite event WD: Em. ts – E 1. ts < W 1 SSC (1) mistakenly omits events which should be put into the AIS because they can be coupled with out-of-order events coming in the future, (2) misplaces out-of-order event instances in the AIS SC: (E 1, E 2, …, Em) SSC PSSC: window W Active Instance Stacks (AIS) SS: (E 1, E 2, …, Em) Input Event Stream Unauthorized AIS Purge 2 PSSC mistakenly purges events from the AIS (events might be used to form out-oforder sequences in the future)

b a c a b c 1 3 5 6 10 12 d c

b a c a b c 1 3 5 6 10 12 d c b a b 14 15 16 17 18 d b f c d f 20 21 22 23 25 26 c … b d 19 … 13 11 Received Order

[] a 3 [] a 7 S 1 [a 3] b 6 [a 7]

[] a 3 [] a 7 S 1 [a 3] b 6 [a 7] b 11 [a 7] b 8 S 2 [b 6] d 10 [b 11] d 15 [b 8] d 2 S 3 (b) Incorrect AIS Appending when d 2 Arrives [] a 3 [] a 7 S 1 [a 3] b 6 [a 7] b 11 [b 6] d 10 [b 11] d 15 [a 7] b 8 S 2 S 3 (a) Incorrect AIS Appending when b 8 Arrives

b a c b a d b f c d f 1 3 5

b a c b a d b f c d f 1 3 5 6 7 10 11 12 13 15 16 17 f c 9 Received Order (a) Out-of-Order Event Arrival Example 1 b c b a a d 1 5 6 8 9 10 11 12 13 15 16 7 a b f c d f a … d c d 0 … 3 4 2 (b) Out-of-Order Event Arrival Example 2 Received Order b a c b a d b f c d f 1 3 5 6 7 10 11 12 13 15 16 17 8 (c) Out-of-Order Event Arrival Example 3 f b d 2 Received Order

[ ] a 1 [b 3] [] b 0 [d 2] [b 0] d

[ ] a 1 [b 3] [] b 0 [d 2] [b 0] d 2 [] [ ] a 9 [ ] [a 1] b 3 [d 4] [b 3] d 4 [] [a 1] b 7 [d 8] [b 3] d 5 [] [b 3] d 6 [] S 1 S 2 [b 7] d 8 [] S 3

a c d a d 1 5 6 7 11 Received Order Pb Out-of-Order

a c d a d 1 5 6 7 11 Received Order Pb Out-of-Order Event Arrival Example 1 a c d a d b 1 5 6 7 11 3 Out-of-Order Event Arrival Example 1 Pb Received Order

a c d a d 1 5 6 7 11 Out-of-Order Event Arrival Example

a c d a d 1 5 6 7 11 Out-of-Order Event Arrival Example Received Order

b d a Pd Pa

b d a Pd Pa

a 1 w b c 8 11 22 w Pd 23 Pa w 1

a 1 w b c 8 11 22 w Pd 23 Pa w 1 > w? w 2 > w? a b d d_p d

Pd a b c Pb

Pd a b c Pb

… Source N data Network Source 1 Query Processor Purge with t data Metadata

… Source N data Network Source 1 Query Processor Purge with t data Metadata t Operator state Punctuation Generator Unblock with t Answers

… Source N Network Source 1 Query Processor Operator state Answers<t> Revision tuples <+,

… Source N Network Source 1 Query Processor Operator state Answers<t> Revision tuples <+, t><-, t>

b a c b a d b f c f f 1 3 5

b a c b a d b f c f f 1 3 5 6 7 10 11 12 13 15 16 17 4 (d) Out-of-Order Event Arrival Example 4 f b Received Order