The BOP Billion Object Platform and World Map
The BOP (Billion Object Platform) and World. Map / Dataverse Integration Harvard Center for Geographic Analysis Tuesday, July 12, 2016 Ben Lewis, Mercè Crosas, Raman Prasad
Billion Object Platform - funded by Sloan • General purpose, open source, streaming, big spatio-temporal data exploration and extraction • Performs basic sentiment analysis • Runs on commodity hardware and software • Built on Spatial Lucene and Solr. • Exposes all functions through an API
Other geospatial visualization work (funded by the Boston Area Research Initiative) 1. Spatial stamping in Billion Object Platform 2. Table visualization – Tables with well defined area columns (Census codes) – Tables with lat/longs 3. Geospatial data visualization – Shapefiles
The “Billion Streaming Geo-tweets” dataset • A new dataset type in Dataverse which supports real-time streaming and visual, interactive exploration • The content is geo-tweets (tweets containing GPS coordinate from originating device). • Currently 1 -2% of tweets are geo-tweets, about 8 million per day. The CGA has been harvesting geo-tweets since 2012. • Main components: – 1) Geo-tweet harvesting and archiving system – 2) software and hardware platform to support interactive exploration of a billion spatio-temporal objects. – 3) API to provide query access to the archive from Dataverse. – 4) client-side tools for querying/visualizing the contents of the archive, extracting subsets, pushing them to Dataverse.
The “Billion Streaming Geo-tweets” dataset What does a landing page look like when… – Data source is external to Dataverse – The data source is continuously being updated – The data does not consist of “files” in the traditional Dataverse sense
The BOP: streaming big data… A closer look at the Billion Streaming Geotweets
API to streaming geo-tweets Built on Solr
A dataset landing page which enables data exploration and extraction A client which enables interactive exploration in multiple dimensions
Demo of Big Data exploration using predecessors of BOP : Japan Data Archive and HHypermap • Japan Data Archive http: //jdarchive. org/en/search#view_type=event&media_type=&so rt=relevant& • HHypermap Distributed Archive http: //hypersearch. cga. terranodo. io/maps/new
2) Table Geocoding • Work funded by NSF. • Goal is to enable Dataverse tables with well-known geographic encodings to be easily visualized as maps
Pick the “Geospatial Data Type”
Choose (a) World. Map “Join Layer” & (b) File column to join
Table visualized
Apply cartographic classification
Map symbolized
Map saved back to Dataverse
Thank You Ben Lewis blewis@cga. harvard. edu
Phase II? Use Polygons to Symbolize Big Data • Perform big data query. Find 10 million tweets mentioning Brexit. 18
( Geographic region and sentiment stamping ) • Geographic stamping: As tweets stream in they will be stamped with census block, census tract, and Admin 2 codes. – To support aggregations by census or admin as well as by heatmap grid. • Sentiment stamping: As tweets stream in a basic attempt will be made to determine sentiment. – To support heatmaps representing average sentiment values as well as count values.
Geo-tweet Dataverse • https: //dataverse. harvard. edu/dataverse/geo-tweets
- Slides: 20