An Example Data Stream Management System Telegraph CQ

  • Slides: 26
Download presentation
An Example Data Stream Management System: Telegraph. CQ INF 5100, Autumn 2007 Jarle Søberg

An Example Data Stream Management System: Telegraph. CQ INF 5100, Autumn 2007 Jarle Søberg INF 5100, Autumn 2007 © Jarle Søberg

Telegraph. CQ l l Introduction and overview Description of concepts – – – l

Telegraph. CQ l l Introduction and overview Description of concepts – – – l l 2 Wrappers Fjords Eddies Ste. Ms CACQ Other features A practical overview Limitations INF 5100, Autumn 2007 © Jarle Søberg

Telegraph. CQ: Introduction l l Developed at Berkeley Written in C – l l

Telegraph. CQ: Introduction l l Developed at Berkeley Written in C – l l Based on the Postgre. SQL DBMS Current version: 2. 1 on Postgre. SQL 7. 3. 2 code base – l Each group has a running copy on dmms-lab 107 Project closed down Summer 2006 – 3 Open source GNU license Still, many interesting and important features to discuss INF 5100, Autumn 2007 © Jarle Søberg

Telegraph. CQ: Overview Postmaster Server Back end Fjords Eddies Ste. Ms CACQ Shared memory

Telegraph. CQ: Overview Postmaster Server Back end Fjords Eddies Ste. Ms CACQ Shared memory queues Front end Planner Parser Listener Client Shared memory buffer pool 4 Wrapper clearing house Disk INF 5100, Autumn 2007 © Jarle Søberg

Telegraph. CQ: Overview l Based on modules – – – l Communicate via Fjords

Telegraph. CQ: Overview l Based on modules – – – l Communicate via Fjords – – 5 Query processing Adaptive routing Ingress and caching Push and pull data in pipeline fashion Reduce overhead by non-blocking behavior INF 5100, Autumn 2007 © Jarle Søberg

Wrappers l l l Transform data to Datum items Push or pull Several formats

Wrappers l l l Transform data to Datum items Push or pull Several formats – l l Contacted via TCP Wrapper clearing house (WCH) – l 6 Comma separated format (CSV) is used by Telegraph. CQ Many connections possible Store streams to database if needed INF 5100, Autumn 2007 © Jarle Søberg

Wrappers l Shedded tuples, Data Triage – Support for dropping tuples l – Look

Wrappers l Shedded tuples, Data Triage – Support for dropping tuples l – Look at Morten’s presentation about methods Periodically summarize tuple information shed – Runs “shadow” queries on shedded tuples l 7 Shared Memory Buffer Pool The queries run in parallel with the real queries INF 5100, Autumn 2007 © Jarle Søberg

Eddies l DBMSs – – – 8 Query plan created once E. g. joined

Eddies l DBMSs – – – 8 Query plan created once E. g. joined (we use “ " ” to show a join) on some attributes may give this plan: Ok, as long as data set is finite and pulled INF 5100, Autumn 2007 © Jarle Søberg

Eddies Blocking or throwing away tuples is unavoidable! l 9 How about pushed data?

Eddies Blocking or throwing away tuples is unavoidable! l 9 How about pushed data? INF 5100, Autumn 2007 © Jarle Søberg

Eddies • Might be much changes in the different streams • Reconfiguration may take

Eddies • Might be much changes in the different streams • Reconfiguration may take long time • Not dynamic enough l 10 A reconfiguration is necessary INF 5100, Autumn 2007 © Jarle Søberg

Eddies • Dynamic on a tuple-per-tuple basis eddy in the stream • Adaptive to

Eddies • Dynamic on a tuple-per-tuple basis eddy in the stream • Adaptive to changes l 11 An alternative is to use an eddy: INF 5100, Autumn 2007 © Jarle Søberg

Eddies: Details l Bitmap per tuple represents each operator – – – 12 ready

Eddies: Details l Bitmap per tuple represents each operator – – – 12 ready and done bits The ready bits specifies the operators the tuple should visit Tuple is ready for output when all done bits are set Manipulate bits to set a route for a tuple On creation of new tuples due to e. g. joins: OR the bitmaps 1 0 0 0 tuple 1 0 1 1 INF 5100, 1 Autumn 2007 ©tuple Jarle Søberg tuple

Eddies: Routing policy l Priority scheme – Tuples coming from an operator = high

Eddies: Routing policy l Priority scheme – Tuples coming from an operator = high priority l l Originally: Back-pressure – – l 13 Prevents starvation Self regulating due to queuing Naïve, hence not optimal Extended to lottery scheduling INF 5100, Autumn 2007 © Jarle Søberg

Eddies: Lottery scheduling l Each operator has ticket account – – l Lottery among

Eddies: Lottery scheduling l Each operator has ticket account – – l Lottery among available operators – – 14 Credited for each arriving tuple Debited for each leaving tuple Empty in-queue: Fast operators High number of tickets: Low selectivity operators INF 5100, Autumn 2007 © Jarle Søberg

Eddies: Lottery scheduling l Low selectivity operators – – Win even if the operator

Eddies: Lottery scheduling l Low selectivity operators – – Win even if the operator is slowing down Expand with a window scheme l l Banked tickets Escrow tickets 2 operator 2 1 0 5 4 3 1 0 window 15 INF 5100, Autumn 2007 © Jarle Søberg

Eddies l l 16 Works for single query environments Simple and adaptive May still

Eddies l l 16 Works for single query environments Simple and adaptive May still not be optimal with respect to dynamic changes over e. g. a single join Extend the eddy’s strength by introducing state modules (Ste. Ms) INF 5100, Autumn 2007 © Jarle Søberg

Ste. Ms l Split joins in two – Dynamic T R – Send build

Ste. Ms l Split joins in two – Dynamic T R – Send build tuples l – eddy Build hash tables Send probe tuples l 17 S Look for matches R S T INF 5100, Autumn 2007 © Jarle Søberg

Ste. Ms l Any possible problems? S – l Two equal intermediate tuples! Solved

Ste. Ms l Any possible problems? S – l Two equal intermediate tuples! Solved by globally unique sequence number – 18 R Only youngest tuples allowed to match INF 5100, Autumn 2007 © Jarle Søberg

Ste. Ms: Issues l Ste. Ms are implemented using hash tables – l Alternatively,

Ste. Ms: Issues l Ste. Ms are implemented using hash tables – l Alternatively, use B-trees – – 19 Only equi-joins work properly Can correctly express more: “<>”, “>>”, “<=”, … Is this consistent with the data stream concept? INF 5100, Autumn 2007 © Jarle Søberg

Eddies and Ste. Ms l l 20 Still single-query environment DSMSs aim to support

Eddies and Ste. Ms l l 20 Still single-query environment DSMSs aim to support many concurrent queries This feature needs to be adaptive and manage creation and deletion of queries in real-time Optimization is proven NP-hard INF 5100, Autumn 2007 © Jarle Søberg

Introducing CACQ l l Continuously adaptive continuous queries Heuristics – – l l 21

Introducing CACQ l l Continuously adaptive continuous queries Heuristics – – l l 21 Adding more information to the tuples Creating even more meta information Avoid sending same singleton and intermediate tuples to same operators First of all: Use grouped filters! INF 5100, Autumn 2007 © Jarle Søberg

CACQ: Grouped Filters l Module for early filtering of selection predicates – For example:

CACQ: Grouped Filters l Module for early filtering of selection predicates – For example: SELECT * FROM stream WHERE stream. a = 7 – – 22 All tuples without stream. a = 7 are not sent to the eddy Includes “>”, “<”, and “ ”, as well INF 5100, Autumn 2007 © Jarle Søberg

The CACQ Tuple l l Extended the eddy tuple to include bitmaps for queries.

The CACQ Tuple l l Extended the eddy tuple to include bitmaps for queries. Completed and source. Id The queries. Completed bitmap – – l The source. Id bitmap – 23 Represents the queries Shows a lineage of the tuple Source when queries do not share tuples INF 5100, Autumn 2007 © Jarle Søberg

Eddies, Ste. Ms, and CACQ: Issues l Bitmap statically configured – l Faster, but

Eddies, Ste. Ms, and CACQ: Issues l Bitmap statically configured – l Faster, but not dynamic Much overhead experienced by the developers – – Tuple-by-tuple processing takes time Batching tuples are suggested l 24 Static for shorter periods INF 5100, Autumn 2007 © Jarle Søberg

Continuous Queries in Telegraph. CQ l Windowing supports sliding, hopping, and jumping behavior –

Continuous Queries in Telegraph. CQ l Windowing supports sliding, hopping, and jumping behavior – – Aggregations are important for correct results Output does not start until window is reached when aggregations are used SELECT stream. color, COUNT(*) FROM stream [RANGE BY ‘ 9’ SLIDE BY ‘ 1’] GROUP BY stream. color window 1 2 2 1 1 2 1 25 INF 5100, Autumn 2007 © Jarle Søberg START OUPUT!

Other Information l Pros: – – – l Cons: – – 26 Introspective streams

Other Information l Pros: – – – l Cons: – – 26 Introspective streams Sub-queries, to some extent Shadow queries for Data Triage tuples OR is not understood Only istreams, and not dstreams Only six ANDs between Ste. Ms Telegraph. CQ is very unstable at high pressure INF 5100, Autumn 2007 © Jarle Søberg