An Example Data Stream Management System Telegraph CQ
- Slides: 26
An Example Data Stream Management System: Telegraph. CQ INF 5100, Autumn 2007 Jarle Søberg INF 5100, Autumn 2007 © Jarle Søberg
Telegraph. CQ l l Introduction and overview Description of concepts – – – l l 2 Wrappers Fjords Eddies Ste. Ms CACQ Other features A practical overview Limitations INF 5100, Autumn 2007 © Jarle Søberg
Telegraph. CQ: Introduction l l Developed at Berkeley Written in C – l l Based on the Postgre. SQL DBMS Current version: 2. 1 on Postgre. SQL 7. 3. 2 code base – l Each group has a running copy on dmms-lab 107 Project closed down Summer 2006 – 3 Open source GNU license Still, many interesting and important features to discuss INF 5100, Autumn 2007 © Jarle Søberg
Telegraph. CQ: Overview Postmaster Server Back end Fjords Eddies Ste. Ms CACQ Shared memory queues Front end Planner Parser Listener Client Shared memory buffer pool 4 Wrapper clearing house Disk INF 5100, Autumn 2007 © Jarle Søberg
Telegraph. CQ: Overview l Based on modules – – – l Communicate via Fjords – – 5 Query processing Adaptive routing Ingress and caching Push and pull data in pipeline fashion Reduce overhead by non-blocking behavior INF 5100, Autumn 2007 © Jarle Søberg
Wrappers l l l Transform data to Datum items Push or pull Several formats – l l Contacted via TCP Wrapper clearing house (WCH) – l 6 Comma separated format (CSV) is used by Telegraph. CQ Many connections possible Store streams to database if needed INF 5100, Autumn 2007 © Jarle Søberg
Wrappers l Shedded tuples, Data Triage – Support for dropping tuples l – Look at Morten’s presentation about methods Periodically summarize tuple information shed – Runs “shadow” queries on shedded tuples l 7 Shared Memory Buffer Pool The queries run in parallel with the real queries INF 5100, Autumn 2007 © Jarle Søberg
Eddies l DBMSs – – – 8 Query plan created once E. g. joined (we use “ " ” to show a join) on some attributes may give this plan: Ok, as long as data set is finite and pulled INF 5100, Autumn 2007 © Jarle Søberg
Eddies Blocking or throwing away tuples is unavoidable! l 9 How about pushed data? INF 5100, Autumn 2007 © Jarle Søberg
Eddies • Might be much changes in the different streams • Reconfiguration may take long time • Not dynamic enough l 10 A reconfiguration is necessary INF 5100, Autumn 2007 © Jarle Søberg
Eddies • Dynamic on a tuple-per-tuple basis eddy in the stream • Adaptive to changes l 11 An alternative is to use an eddy: INF 5100, Autumn 2007 © Jarle Søberg
Eddies: Details l Bitmap per tuple represents each operator – – – 12 ready and done bits The ready bits specifies the operators the tuple should visit Tuple is ready for output when all done bits are set Manipulate bits to set a route for a tuple On creation of new tuples due to e. g. joins: OR the bitmaps 1 0 0 0 tuple 1 0 1 1 INF 5100, 1 Autumn 2007 ©tuple Jarle Søberg tuple
Eddies: Routing policy l Priority scheme – Tuples coming from an operator = high priority l l Originally: Back-pressure – – l 13 Prevents starvation Self regulating due to queuing Naïve, hence not optimal Extended to lottery scheduling INF 5100, Autumn 2007 © Jarle Søberg
Eddies: Lottery scheduling l Each operator has ticket account – – l Lottery among available operators – – 14 Credited for each arriving tuple Debited for each leaving tuple Empty in-queue: Fast operators High number of tickets: Low selectivity operators INF 5100, Autumn 2007 © Jarle Søberg
Eddies: Lottery scheduling l Low selectivity operators – – Win even if the operator is slowing down Expand with a window scheme l l Banked tickets Escrow tickets 2 operator 2 1 0 5 4 3 1 0 window 15 INF 5100, Autumn 2007 © Jarle Søberg
Eddies l l 16 Works for single query environments Simple and adaptive May still not be optimal with respect to dynamic changes over e. g. a single join Extend the eddy’s strength by introducing state modules (Ste. Ms) INF 5100, Autumn 2007 © Jarle Søberg
Ste. Ms l Split joins in two – Dynamic T R – Send build tuples l – eddy Build hash tables Send probe tuples l 17 S Look for matches R S T INF 5100, Autumn 2007 © Jarle Søberg
Ste. Ms l Any possible problems? S – l Two equal intermediate tuples! Solved by globally unique sequence number – 18 R Only youngest tuples allowed to match INF 5100, Autumn 2007 © Jarle Søberg
Ste. Ms: Issues l Ste. Ms are implemented using hash tables – l Alternatively, use B-trees – – 19 Only equi-joins work properly Can correctly express more: “<>”, “>>”, “<=”, … Is this consistent with the data stream concept? INF 5100, Autumn 2007 © Jarle Søberg
Eddies and Ste. Ms l l 20 Still single-query environment DSMSs aim to support many concurrent queries This feature needs to be adaptive and manage creation and deletion of queries in real-time Optimization is proven NP-hard INF 5100, Autumn 2007 © Jarle Søberg
Introducing CACQ l l Continuously adaptive continuous queries Heuristics – – l l 21 Adding more information to the tuples Creating even more meta information Avoid sending same singleton and intermediate tuples to same operators First of all: Use grouped filters! INF 5100, Autumn 2007 © Jarle Søberg
CACQ: Grouped Filters l Module for early filtering of selection predicates – For example: SELECT * FROM stream WHERE stream. a = 7 – – 22 All tuples without stream. a = 7 are not sent to the eddy Includes “>”, “<”, and “ ”, as well INF 5100, Autumn 2007 © Jarle Søberg
The CACQ Tuple l l Extended the eddy tuple to include bitmaps for queries. Completed and source. Id The queries. Completed bitmap – – l The source. Id bitmap – 23 Represents the queries Shows a lineage of the tuple Source when queries do not share tuples INF 5100, Autumn 2007 © Jarle Søberg
Eddies, Ste. Ms, and CACQ: Issues l Bitmap statically configured – l Faster, but not dynamic Much overhead experienced by the developers – – Tuple-by-tuple processing takes time Batching tuples are suggested l 24 Static for shorter periods INF 5100, Autumn 2007 © Jarle Søberg
Continuous Queries in Telegraph. CQ l Windowing supports sliding, hopping, and jumping behavior – – Aggregations are important for correct results Output does not start until window is reached when aggregations are used SELECT stream. color, COUNT(*) FROM stream [RANGE BY ‘ 9’ SLIDE BY ‘ 1’] GROUP BY stream. color window 1 2 2 1 1 2 1 25 INF 5100, Autumn 2007 © Jarle Søberg START OUPUT!
Other Information l Pros: – – – l Cons: – – 26 Introspective streams Sub-queries, to some extent Shadow queries for Data Triage tuples OR is not understood Only istreams, and not dstreams Only six ANDs between Ste. Ms Telegraph. CQ is very unstable at high pressure INF 5100, Autumn 2007 © Jarle Søberg
- Data stream management system
- Data stream in multimedia
- Data stream management system
- Differentiate byte stream and character stream
- Aba-aba telegraph kemudi
- Telegraph machine
- Transatlantic telegraph cable apush
- Sunday standard/ the telegraph
- Lucky luke telegraph
- Elisha gray biography
- Sheffield local studies library
- Telegraph wires twisted pair
- Transatlantic telegraph cable apush
- Bloom filter for stream data mining
- Input stream in java
- Data stream characteristics in multimedia
- Alur data memory
- Data stream
- Stream data model
- Stream data model
- Kafka vs dds
- Stream data model
- Data stream
- Data stream
- Data stream
- Counting distinct elements in a stream
- Models and issues in data stream systems