Big Data Quality Challenges for the Internet of
Big Data Quality Challenges for the Internet of Things (Io. T) Vassilis Christophides INRIA Paris (MUSE team)
The Internet of Things (Io. T) Networks of physical objects (aka. things) with embedded sensing and actuating capabilities that communicate with other objects and information systems • • • item identification (tagging things) sensors (feeling things) nanotechnology (shrinking things) … “Things” are highly heterogeneous: • • • Small (RFID tag) or Big (car) Fixed (fridge) or Mobile (activity tracker) Environment (thermostat) or Personoriented (body analyzer) 1
Big Data = Transactions+Interactions+Observations Io. T devices are reporting even more personal data than humans are! • Petabytes • Terabytes • Gigabytes • Megabytes Increasing Data Variety, Velocity, & Veracity hortonworks. com/blog/7 -key-drivers-for-the-big-data-market
Io. T Data Value Chain PROCESS ANALYSE Capture, track Transmit data Ingest, store & Analyze data, control & monitor to external env. integrate data & automate 6 http: //www. codit. eu/how-can-we-help/internet-of-things
Io. T Data Quality (DQ) Reliability: – – Accuracy: depends on device calibration & sensing method Validity: depends on the resources constraints (connectivity, bandwidth, power, memory, storage & processing capabilities) of devices and data infrastructure Completeness: • due to data variety a complete domain knowledge is infeasible • due to data variability domain knowledge quickly becomes obsolete Integrity: usually relative to a collection of raw data series originating from different devices Ben Stansall/Agence France-Presse/Getty Images 8
In Search of Io. T DQ Solutions Let – – – the data speak for itself! Learn models (semantics) from the data • robust to the presence noise (and anomalies) Detect deviations of data from learned models Evolve learned models according to data deviations Computing with Big Data! – – – Volume: Scalable algorithms (efficiency vs accuracy) Variety: Looking at condition and context of data deviations Velocity: Incremental and online algorithms 9 Data Quality: the “other” Face of Big Data B. Saha, D. Srivastava ICDE 2013
Towards DQ-aware Io. T Analytics Analyze – – – How we can incrementally detect deviations from data regions of normal behavior? How we can distinguish between data glitches, meaningful events or even malicious attacks? What types of data deviations can be identified (distance, density, contextual) and at what granularity level? Analyze – – a single data stream: multiple data streams: How we can compute online correlations across time/space in case missing or delayed data ? How we can progressively evolve extracted knowledge patterns (motifs, episodes)? 10
Key Analytics to Delivering Value in Io. T http: //fr. slideshare. net/wclquang/the-analytics-value 11 chain-key-to-delivering-business-value-in-iot
Thank you! 12
The Three Domains of Information Source: Barry Devlin, “The Big Data Zoo --- Taming the Beasts 13
Computing with Things: Challenges Things are different than servers in a Data Center: they are used in the wild, and they are often constrained by limited connectivity, bandwidth, power, memory, storage & processing capabilities Things are different from UI clients: they don’t usually dispose on-board an UI inheriting more by a M 2 M communication than UI client-to-server interaction paradigm Things may directly communication with peers: It isn't all thin-client communication to the parent server in the cloud and hub-and-spoke model presents serious limitations for very large number of devices 14
re-workblog. tumblr. com 15
16 http: //www. kdnuggets. com/2015/08/patterns-streaming-realtime-analytics. html
- Slides: 13