Open Data reflections from behind the Big Firewall
Open Data – reflections from behind the Big Firewall Or, may you be cursed to live in interesting times
Open Data …. Why bother? On-demand interaction will increasingly be the norm for a global community of virtual innovators … who expect their user experience to be as simple as ‘using an appliance’ In 2013 expect generation of >850 Exabytes of Internet data. Mostly user contributed content (versus traditional enterprise sources). Open Contributed Content will become a core, strategic, economic resource – and the most accessible & scalable resource we possess. Global access to technology is already driving trends like ‘virtual citizenship’, ‘virtual employment’ & ‘social innovation’ Mobility, Openness & Connection will matter more than Presence & Rigid Structures
Open Data and Economics or …. ‘Greater Fool Investing’ …. . !!’ Open data is a potential new 'raw material' for economic growth. It requires effort to produce and maintain. Unlike traditional raw materials like oil, gas and minerals, its value increases fastest when it is open and shareable. Open Data alone does not generate direct economic benefit sufficient to offset production & operational costs … the question is … can it generate sufficient ‘value’ to be sustainable? Incentives must be in place to sustain “economically significant” amounts of Open Data Some bright lights … but we need answers before we run out of steam!! Bubble … "trade in high volumes at prices that are considerably at variance with intrinsic values".
How Private is Private? Privacy is not absolute, it is a balance between Risk and Utility Open Data usage is inherently contradictory • Social media usage -> Maximize Utility + (Largely) Ignore Risk • Enterprise usage -> Maximize Utility + Minimize Risk Who carries liability in case of dispute? Uncertainty in usage policies is a substantial form of business risk Recognize in policy and legislation that privacy is mutable - based on context ✔Available Open Data useful to identify & characterize group behaviors ✖Negative usage for ‘nuisance’ providers to identify high-value targets {∃(high value residences)} ∩ {∃(long emergency response time)} ∩ {∃(many local area crimes)} {area where people might buy home security products} (all available on open data sites near you)
A Fun Use Case
Challenges for Privacy in an Open Data World And I haven’t even mentioned Trust, Provenance, Security, ……
Research impact: what we have learned so far There are plenty of interesting challenges!! • – – • Selected research results: -Live deployment in Dublin -Won prize in Semantic Web Challenge -Paper at ISWC -Paper at Hypertext -Invited paper at Journal of Web Semantics Data 100’s of datasets, 1000’s of files Very open domain(s) Very expensive to normalize Scaling complexity from high dimensionality Approach – Pay-as-you go approach, only process what you need – Do not stick to a common model, use any you can find – Generate interesting views and feed them to “analytics” • Lessons learned – Multiple models, depending on context – Need to do things incrementally – Lightweight generally better than heavyweight Documents + Metadata Structure Entities …. Pay-as-you-go, Gain-as-you-go Links Views Insight
Dublinked - Towards a robust test-bed for Open Data Research Testbed Challenges include. . Privacy & Security Publication & Annotation Catalog & Navigation Search & Query Knowledge Representation & Reasoning Visualization & Analytics Scalable privacy and security of resources Automated assimilation and sharing of resources Robust models to organize and represent resources and their context Open REST Web Services API Represent knowledge efficiently for continuous machine reasoning and diagnosis Enterprise Platform IBM IOC IBM Connections Interaction with Industry Solutions Social Media & Collaboration Compose resources for development, mash-up & visualization IBM Enterprise Cloud Scalable compute, storage & network infrastructure Key Dublin City Provider 1…N Enterprise Citizen IBM Research IBM Products & Services Partners & People
What we do: Learning Systems to Help Diagnose the City Problem How can we provide City decision makers with explanations and diagnoses for events by applying machine reasoning techniques to a fusion of massive, rich, complex and dynamic data? How can we move from explanation to prediction? Challenges • Identifying relevant data and information • Capturing and representing anomalies • Correlating knowledge on heterogeneous data sources • Advanced fusion of heterogeneous data from multiple sources Goals • • • Identification of the nature and cause of changes Explaining logical connection of knowledge across space and time Move from explanation to prediction Detection to Diagnosis? Anomaly Detected: Delayed buses, congested roads
Outline Research Roadmap Dynamic Distributed Information Analytics • Life analytics (social/health/public safety) • High-risk/time-critical alerting • Cross-agency Alerting Use Cases • Cross Web-Enterprise Analytics • Many-agency Analytics • Public Safety Integrator 2015 2014 • Linked Data Cloud Context Retrieval • Cross-agency Analytics Da ta Wa reh ou se 2013 • Fine-grain Access Control • Streaming Analytics • Distributed Reasoning • Context Mining • Provenance • Privacy • High-volume distributed querying • Wide-scale distributed querying • Distributed Entity Linking • Lightweight Distributed Information Access • Contextual Access • Basic Access Control • Distributed Entity Consolidation • Graph Access Technology
- Slides: 10