Telegraph An Adaptive Global Scale Query Engine Joe
Telegraph: An Adaptive Global. Scale Query Engine Joe Hellerstein
Scenarios • Ubiquitous computing: more than clients! – sensors and their data feeds are key • smart dust, biomedical (MEMS sensors) • each consumer good records (mis)use – disposable computing • video from surveillance cameras, broadcasts, etc. • Global Data Federation – all the data is online – what are we waiting for? – The plumbing is coming • XML/HTTP, etc. give LCD communication • but how do you query robustly over many sites in the wide area?
There’s a Data Flood Coming
There’s a Data Flood Coming • What does it look like? – Never ends: interactivity required – Big: data reduction/aggregation is key – Unpredictable: this scale of devices and nets will not behave nicely
The Telegraph Query Engine • Key technologies – Interactive Control • interactivity with early answers • online aggregation for data reduction – Continuously adaptive flow optimization • massively parallel, adaptive dataflow via Rivers and Eddies
CONTROL Continuous Output, Navigation & Transformation with Refinement On Line • Data-intensive jobs are long-running. How to give early answers and interactivity? – online interactivity over feeds: data “juggle” – online query processing algs: ripple joins – statistical estimators, and their performance implications • Appreciate interplay of massive data processing, stats, and UIs
CONTROL Continuous Output and Navigation Technology with Refinement On Line
CONTROL Continuous Output and Navigation Technology with Refinement On Line
River • We built the world’s fastest sorting machine – On the “NOW”: 100 Sun workstations + SAN – But it only beat the record under ideal conditions! • River: performance adaptivity for data flows on clusters – simplifies management and programming – perfect for sensor-based streams
Eddy • How to order and reorder operators over time – based on performance, economic/admin feedback • Vs. River: – River optimizes each operator “horizontally” – Eddies optimize a pipeline “vertically”
Telegraph: Putting it Together • Scalable, adaptive dataflow infrastructure. Apps include… – sensor nets – massively parallel and wide-area query engines – net appliances: chaining xform 8 n/aggreg 8 n/etc. proxies – any unpredictable dataflow scenario • Technology: a marriage of… – CONTROL, River & Eddy • Many research questions here • E. g. how to combine River and Eddy adaptivity • E. g. how to tune Eddies for statistical performance goals – Combinations of browse/query/mine at UI – Storage management to handle new hardware realities
Integration with Endeavour • Give – Be data-intensive backbone to diverse clients – Be replication dataflow engine for Ocean. Store – Telegraph Storage Manager provides storage (xactional/otherwise) for Ocean. Store – Provide platform for data-intensive “tacit info mining” • Take – Leverage Ocean. Store to manager distributed metadata, security – Leverage protocols out of Tiny. OS for sensors
Additional Slides • For use in questions, etc.
Connectivity & Heterogeneity • Lots of folks working on data format translation, parsing – we will borrow, not build – currently using JDBC & Cohera Net Query • commercial tool, donated by Cohera Corp. • gateways XML/HTML (via http) to ODBC/JDBC – we may write “Teletalk” gateways from sensors • Heterogeneity – never a simple problem – Control project developed interactive, online data transformation tool: Potter’s Wheel
Potter’s Wheel Anomaly Detection
- Slides: 16