Big Data for Official Statistics Herman Smith UNSD
Big Data for Official Statistics* Herman Smith UNSD 10 th Meeting of the Advisory Expert Group on National Accounts 13 -15 April 2016, Paris * Prepared by Ronald Jansen, UNSD 1
Drivers o o o Availability of automatically generated data in electronic format, such as mobile phone, social media, electronic commercial transactions, sensor networks, smart meters, GPS tracking device, or satellite images Higher frequency, more granularity, wider coverage, lower cost for data collection Modernisation of statistical production and services 2
Key messages Big Data for core national statistics – for integrated economic, social and environmental policies Big Data for agile statistics – for emergency issues Big Data to keep official statistics relevant – private sector moves fast Big Data as part of modernization of statistical systems – new production processes and partnerships Big Data to meet the data demand of the 2030 agenda – monitoring policies – “leave no one bend” 3
Big Data for Official Statistics Benefits – Example of Social media data o Widespread use of social media, also in developing countries o Timely, high frequency and wide coverage o Great potential in tracking sentiments, such as consumer confidence o Potential use for tracking prices and outbreak of diseases, and useful in combination with other data, such as population census and geo-spatial data
Examples of Big Data projects 5
Examples 1: Telenor Big Data project on Poverty prediction (SDG 1) Among the major mobile operators in the world Approaching 200 million mobile subscriptions (e. g. in Bangladesh, India, Pakistan, Myanmar and Thailand) 33 000 employees Present in markets with 1. 6 billion people • A team of 9 Data scientists • Collaboration partners at leading academic research institutions • Bridge between academic research and all business units • Explore and develop new ways to utilize customer data across markets 6
Billions of data points collected each day A number - Caller Date & time B number – Receiving party Type: Call, SMS, Data, etc Data volume Cell_ID: Location IMSI: SIM card 7 TAC: Handset
Introducing mobile phone data in Poverty prediction Survey data • Telco surveys • DHS • PPI Mobile phone data • • Basic phone usage Advanced phone usage Social Network Mobility Top-up Revenue Handset Satellite layers PREDICTION • • # poor per km 2 Prediction maps 8 • • • • Population Aridity index Evapotranspiration Various animal densities Night time lights Elevation Vegetation Distance to roads/waterways Urban/Rural Land cover Pregnancy data Births Ethnicity Precipitation Annual temperature Global human settlement layer
Introducing mobile phone data in Poverty prediction Methods 1. Spatial prediction • Bayesian geostatistical modelling • Prediction maps 2. Individual classification using machine learning methods • RF • GBM • SVM • Deep learning Poverty Prediction map 9
Example 2: National Statistical Office of Tunisia Big Data project on Good Governance (SDG 16) 10
October 2015 BIG DATA and Monitoring SDG 16 in Tunisia? SOCIAL MEDIA as a BIG data source Kamel ABDELLAOUI, Direction de la diffusion , INS- Tunisie Eduardo López-Mancisidor, Programme des Nations Unies pour le développement - Tunisie
Analyzing Social media for SDG 16: Why? Could social media data provide similar or new insights on public opinion to potentially complement or substitute household survey data? Social media, WHY? Internet users in Tunisia (in thousands) 6, 000 § Free, public, easy access § No privacy issues § Express opinion 5, 000 4, 000 3, 000 2, 000 1, 000 0 Opinions in here 1999 2001 2003 2005 2007 2009 2011 2013 2015
Analyzing Social media for SDG 16: How? Selecting sources Taxonomy of keywords Categor ising Training Exploring Analysing Comparing Exporting
Analyzing Social media for SDG 16: Outputs Volume Data Sources Word Cluster Sentiment Word Cloud
Example 3: Statistics Canada linking Google Maps with the Statistical Business Register (SDG 9) 15
What can be gained from linking the SBR with Geo-spatial Information? Cross-sectional views of enterprise characteristics by (sub-national) regions: • • • Are there regional patterns of economic activity? Are larger enterprises equally spread over the country? Is FDI equally spread over the country? 16
Statistics Canada– Geolocation of SBR data • • To study the potential of conducting economic analysis of small geographic areas by using Business Register (BR) microdata Using BR data geocoded at the census subdivision (CSD) level, in combination with travel distance data generated from the Google Maps API The identification of resource sectors is based on the aggregation of business data at the CSD level from the BR A database was created, containing: o o o BR employment data, derived from payroll deduction files BR revenue data, derived from the General Index of Financial Information, and the six-digit North American Industrial Classification System (NAICS) code from the Business Register. 17
Statistics Canada – employment by community & economic activity 06/06/2021 United Nations Statistics Division 18
GWG on Big Data for official statistics 19
United Nations Global Working Group on Big Data for Official Statistics o Created in March 2014 o 32 Members – § 22 Countries and 10 International Agencies 20
Global survey on Big Data Projects 21
Global survey on Big Data Projects 22
Thank you 23
URLs to websites Telenor Research http: //www. telenor. com/media/press-releases/2015/telenor-research-deploys-big-dataagainst-dengue/ Mexico - Business Register on Google Earth http: //www 3. inegi. org. mx/sistemas/mapa/denue/default. aspx Geo-location of Business Register https: //www. unece. org/fileadmin/DAM/stats/documents/ece/ces/ge. 42/2015/Session_III_Can ada_-_Geolocation_of_BR_data__room_document_. pdf Global Survey on Big Data http: //unstats. un. org/unsd/trade/events/2015/abudhabi/presentations/day 1/04/UNSD%20%20 Global%20 Survey%20 on%20 Big%20 Data. pdf Big Data Quality Framework http: //unstats. un. org/unsd/trade/events/2015/abudhabi/presentations/day 3/01/3_Quality_Fra mework_Righiv 3. pdf 24
URLs to websites United Nations Statistics Division http: //unstats. un. org/unsd/dnss/Quality. NQAF/nqaf. aspx United Nations Statistics Division/ Trade Statistics Branch http: //unstats. un. org/unsd/trade/default. asp United Nations Statistical Commission http: //unstats. un. org/unsd/statcom/commission. htm United Nations Global Working Group on Big Data for official statistics http: //unstats. un. org/unsd/bigdata/ http: //unstats. un. org/unsd/trade/events/2014/Beijing/default. asp http: //unstats. un. org/unsd/trade/events/2015/abudhabi/default. asp United Nations General Assembly Resolutions http: //www. un. org/en/ga/70/resolutions. shtml United Nations History Publications http: //www. unhistory. org/publications/ 25
URLs to websites United Nations Sustainable Development https: //sustainabledevelopment. un. org/topics United Nations Global Pulse http: //www. unglobalpulse. org/ Project 8 http: //demandinstitute. org/projects/project-8/ United Nations Data Revolution http: //www. undatarevolution. org/ United Nations Statistics Division / SDG indicators http: //unstats. un. org/sdgs/ United Nations Statistics Division/ Modernization of Statistical Systems http: //unstats. un. org/unsd/nationalaccount/workshops/2015/New. York/lod. asp 26
URLs to websites United Nations Global Pulse http: //www. unglobalpulse. org/ World Pop http: //www. worldpop. org. uk/ Data Pop http: //datapopalliance. org/ Flowminder http: //www. flowminder. org/ UNU-EHS http: //ehs. unu. edu/ Future Earth http: //www. futureearth. org/ UProject http: //ureport. ug/ 27
- Slides: 27