Big Data and Official Statistics Philippine Context Erniel
Big Data and Official Statistics: Philippine Context Erniel B. Barrios
Outline • Concepts and Definitions • Coverage of Big Data • Big Data and Official Statistics: Preliminary Framework • Current Practices (Some Models) • Possible Big Data in the Philippines • Next Steps
Frequency of Documents Containing Big Data in Pro. Quest Research Library
Basis of Definitions • Stakeholders may define Big Data differently • Data storage and data analysis • Intertwined technical and socio-technical issues • Multiple, ambiguous and often contradictory definitions • “Big” => significance, complexity, challenge • Five V’s • • • Volume (size) Velocity (rate of production) Variety (format, representations) IBM: V is Veracity (trust and uncertainty) SAS: Variability (complexity). • Intel: generating a median of 300 TB
Basis of Definitions • Size: volume of the dataset • Complexity: structure, behavior and permutations of the dataset • Technologies: tools and techniques which are used to process a sizable or complex dataset
Definitions • Appropriate description, integration, and sustainability of very large datasets generated by high throughput experiments • Large collection of small disparate, unstructured datasets, (taken together, can be analyzed to find unusual trends). • Emergence of digital enterprise, ability for an organization to take full advantage of its digital assets, collectively large amount of data • Oracle: Inclusion of additional data sources to augment current operations • Microsoft: process of applying serious computing power (machine learning, AI) to seriously massive and often highly complex sets of information.
Definitions • Big Data describes the storage and analysis of large and/or complex data sets using a series of techniques. • High-volume, high-velocity, and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision-making. • Describes large volumes of high velocity, complex and variable data that require advanced techniques and technologies to enable the capture, storage, distribution, management, and analysis of information. • UNECE: Big Data-data that is difficult to collect, store of process within the conventional systems of statistical organizations. Either their volume, velocity, structure or variety
Online Survey of 154 Global Executives (April 2012)
Definition • Big Data • • Not only in size (though volume can be part of it) Varying Sources, Several Variables (Indicators) Differing data collection methods (compilation) Frequency (possibly irregular) • Issues • • Quality Architecture Security/Confidentiality Integrity Standardization Data extraction Data Mining
New Data Sources • Consumer Usage Database • Blogs • Social Media • Sensor Networks • Image Data • May vary in • Size • Structure • format
Coverage of Big Data • Basic research data • Electronical health records • Consumer Usage Database • Proposals submitted • Administrative data • Censuses and Surveys
Types of Big Data (Classification) • Social Network: Human-sourced information • Social networks, Blogs, Personal Documents, Pictures, Videos, Internet Searches, Mobile Data, User-generated maps, E-mail • Traditional Business Systems: Process-mediated data • Public agencies (including medical records), produced by business (commercial transactions, banking/stocks records, E-commerce, Credit Cards) • Internet: machine-generated • Fixed sensors: home automation, weather/pollution sensor, traffic, scientific, security/surveillance • Mobile sensors: mobile phone, cars, satellite images • Computer systems: logs, web logs
Current Practices •
Big Data and Official Statistics • Location data for mobile phones • used for instantaneous daytime population and tourism statistics • proxy indicators for demand • Social media messages • Process into early indicators of consumer confidence • Price information on the web, from loyalty cards • Inflation level • Google search • Prevalence rate of Influenza • Tweets • Stock market prices
Big Data and Official Statistics: Preliminary Framework Businesses Farms Surveys Census Administrative Reports Collaboration (PPP) Households Individuals Methodology Other Big Data Human Resources Official Statistics, SDG
Possible Big Data in the Philippines • From PSA/NGA, LGU • Censuses • Survey • Administrative Reports • Regulation, Licensing and Compliance • Monitoring (e. g. , MFO, Budgeting, Intervention (4 Ps, RSBSA, etc. ) • Registers (BIR, COMELEC, UMID, GSIS/SSS, Philhealth, Pag-Ibig, etc. ) • Private/Commercial • • Telco Credit cards Loyalty Cards POS Images Sensor Social Media, Google, etc.
Next Steps • What is available? • • Big data sources Data that can shared, frequency, timeliness Data security, confidentiality issues Big Data and Official Statistics: Is it feasible? , Is it worthy? • What is needed for collaboration, data-sharing?
Thank you.
- Slides: 18