Motivation PODS 2002 1 Data Streams Traditional DBMS

Motivation PODS 2002 1

Data Streams Traditional DBMS – data stored in finite, persistent data sets n New Applications – data input as continuous, ordered data streams n ¨ Network monitoring and traffic ¨ Telecom call records ¨ Network security ¨ Financial applications ¨ Sensor networks ¨ Manufacturing processes ¨ Web logs and clickstreams ¨ Massive data sets PODS 2002 engineering 2

Data Stream Management System User/Application Register Query Results Stream Query Processor Scratch Space (Memory and/or Disk) PODS 2002 Data Stream Management System (DSMS) 3

Meta-Questions n n n Killer-apps ¨ Application stream rates exceed DBMS capacity? ¨ Can DSMS handle high rates anyway? Motivation ¨ Need for general-purpose DSMS? ¨ Not ad-hoc, application-specific systems? Non-Trivial ¨ DSMS = merely DBMS with enhanced support for triggers, temporal constructs, data rate mgmt? PODS 2002 4

Sample Applications n Network security (e. g. , i. Policy, Net. Forensics/Cisco, Niksun) ¨ Network packet streams, user session information ¨ Queries: URL filtering, detecting intrusions & DOS attacks & viruses n Financial applications (e. g. , Traderbot) ¨ Streams of trading data, stock tickers, news feeds ¨ Queries: arbitrage opportunities, analytics, patterns PODS 2002 5

DBMS versus DSMS Persistent relations n One-time queries n Random access (pull) n “Unbounded” disk store n Only current state matters n Passive repository n Relatively low update rate n No real-time services n Assume precise data n Access plan determined by query processor, physical DB design PODS 2002 n n n Transient streams Continuous queries Sequential access (push) Bounded main memory History/arrival-order is critical Active stores Possibly multi-GB arrival rate Real-time requirements Data stale/imprecise Unpredictable/variable data arrival and characteristics 6
- Slides: 6