BackEnd Structures and Front End Visualizations DAMA Minnesota
Back-End Structures and Front End Visualizations DAMA Minnesota Matthew Israelson 19 November, 2014
About Us IHME is an independent global health research center at the University of Washington Vision: Provide high-quality information on population health, its determinants, and the performance of health systems. Mission: Improve the health of the world’s populations by providing the best information on population health. Method: Produce rigorous and comparable measurements. For general information Phone: +1 -206 -897 -2800 Fax: +1 -206 -897 -2899 ihme@healthdata. org
A Short History Started in 2007 and continuing to grow into 2014 • July 2007: Founding of IHME with support from Bill & Melinda Gates Foundation and the state of Washington • July 2009: Published the Financing Global Health (FGH) report • June 2010: Graduated the first Masters of Public Health • March 2011: Launched the Global Health Data Exchange (GHDx) at the Global Health Metrics and Evaluation conference • December 2012: The Lancet published The Global Burden of Diseases, Injuries, and Risk Factors Study 2010 (GBD 2010) • December 2014: First annual update with GBD 2013
Agenda Back End Structures Front End Visualizations • Data Collection • Infrastructure • Modeling and Analysis • Audience • Outreach • Visualizations
Back-End Structures Data Collection • Process • Collection • Cataloging Infrastructure • People • Networking • Technology Modeling & Analysis • GBD • Data model • Deliverables
Overview the data cycle Locate data Analyze Negotiate Search for Provide results data with new toto health our identify providers sources researchers data from: for gaps access Add to the GHDx Identify gaps • • • Years Formal Government Notify requests and Assign teams NID NGO websites Causes Collaboration Extract data Create citation Databases Countries DUA Import / MOU to research / IRB Add keyword Expert advice databases Etc. Payment Attach files Literature Provide sourcing Extract data Acquire data Catalog data
What we collect Health Surveys Census Records Surveillance Systems Disease Registries Vital Registration Hospital Records Financial Records Literature Estimates # of Records in the GHDx 50000 45000 [VALUE] 40000 35000 30000 28000 25000 20000 15000 14000 10000 5500 7000 0 January, 2011 January, 2012 January, 2013 January, 2014 October, 2014
How we collect it Added 15, 000 new sources of data since January • Not everything is on the Internet • 900+ “high-touch” requests • Applications • Data Use Agreements • IRB Approval • Restricted Data A project management tool is essential • Adopted JIRA in 2013
Sourcing data Global Health Data Exchange (GHDx) • Centralized citation database for IHME • Ensures same citation for the same data • Allows us to source all data points • http: //ghdx. healthdata. org/ Publicly accessible Not publicly accessible GHDx Federated • Nids • All metadata • Citations • • Citations Accessed date Publication status Nids Research databases • Nids
Organizing Data LIVE DEMO http: //ghdx. healthdata. org/
The people Board of Directors & Scientific Oversight Group 210 Employees • Professors: 20+ • Researchers: 90+ • IT: 20+ • Staff: 80+ 16 Affiliate Professors GBD Expert Collaborators
The GBD network GBD enrolled 1, 095 collaborators from 107 countries
Networking as an enabler The collaborative network enhances the GBD • Assess the validity of country results • Identify missing datasets or incorrect interpretations of data • Interpret findings and facilitate country policy translation • Assist with acquiring new sources of data • Publish papers using GBD results The size of the network demands new ways to manage contacts • CRM is an immediate priority
The technical infrastructure Capacity of 250 Terabytes Access limited to IHME Limited Use Access • Restricted to named researchers • Controlled or sensitive datasets • Cluster for running Stata and R jobs (Sun Grid Engine) • Largest capacity at the University of Washington • Capacity to increase 10 x for projections
IT requirements for GBD 12+ major database 8 Servers Cluster (STATA; R) – every day, all day Primary databases for GBD Cod Shared Covariates GHDx Epi Risk Idie 2 GBD results mortality Codmod GBDviz GBDx 2. 0
The Global Burden of Disease (GBD) A systematic, scientific effort to quantify the comparative magnitude of health loss due to diseases, injuries & risk factors. GBD 2010 GBD 2013 • GBD 2010 published by The Lancet Diseases and injuries 291 • GBD 2013 to be published in 322 2014 Sequelae 1, 160 2, 435 • Annual updates to 67 follow 68 Risk factors Countries Years 187 188 1990 -2010 1990 -2013
Measuring burden of diseases and injuries
Data Inputs for GBD Population-based • • • Vital registration Censuses Surveys Verbal autopsy Disease registries Surveillance systems Encounter-level • Hospital records • Ambulatory records • Primary care records • Claims data Other • Literature reviews • Sensor data • Mortuaries/burial sites • Police records
Defining analysis Task of the analysts • Research • Prep data • Write code • Review estimates • Interpret results • Publish
Main components of the data model 2 Mortalit y 3 Causes of death 6 5 YLLs/ YLDs/ DALYs 4 Risk factors 1 Covariates Nonfatal health outcomes
Processes within the data model
Deliverables 188 Countries 322 Disease and Injuries 68 Risk Factors Men and Women 20 Age Groups 1990 -2013 1. 03 billion estimates At least 1, 000 draw calculations per estimate based on known data points and uncertainty All-cause mortality rates Deaths by cause (1980 -2013) Years of life lost (YLLs) Years lived with disability (YLDs) Disability adjusted life years (DALYs)
COOPER LIVE DEMO http: //ghdx. healthdata. org/cooper
Front End Visualizations Purpose & Audience Traditional Outreach Interactive Visualizations • Audience • Underlying principles • Publications • Media • Other approaches • Key Uses • Development • Demonstrations
Audience
Communicating data for impact Audiences and characteristics • Casual user • Data actor • Data analyst • Researcher Granularity of data Type of tool or visual http: //bit. ly/1 mog. Rom
Designing for the right audience Casual User Data Actor Analyst Researcher • Infographics • Illustrative diagrams • Narrative visualizations • Press releases • Reports • Briefs • Search tools • Limited interactive visualizations • Query tools • Exploratory visualizations • API • Query tools • Exploratory visualizations • Data catalog – repository • Methods
IHME outreach Research Articles Policy Reports Brochures Country Profiles Infographics Newsletters Presentations Videos Visualizations IHME Research Articles 35 30 30 25 20 17 14 15 8 10 5 27 25 10 4 0 2007 2008 2009 2010 2011 2012 2013 2014
Policy reports, articles & profiles Note … … …
Infographics Note … … …
News Articles
Blogging and newsletters
@IHME_UW
Video
Open Source Tools Note … … …
Key uses for visualizations 1. Review input data & launch models 2. Review results 3. Obtain feedback from collaborators/ experts Researchers 4. Communicate results 5. Use as presentation / teaching aid 6. Convince data owners to share data Different Audiences
The development process 1. 2. 3. 4. 5. 6. 7. 8. 9. Contact product owner Identification of relevant audience(s) Business and technical requirements Creation of appropriate design Development (using Agile/Scrum) Testing & initial user feedback Launch under embargo (journalists) Public launch Feedback collection
Visualizations LIVE DEMO GBD Compare http: //vizhub. healthdata. org/gbdcompare/ GBD Cause Patterns https: //www. healthdata. org/datavisualization/gbd-cause-patterns
Visualizations LIVE DEMO US Health Map http: //vizhub. healthdata. org/us-health-map/ Tobacco Burden Visualization http: //vizhub. healthdata. org/tobacco/ Millennium Development Goals http: //vizhub. healthdata. org/mdg//
Summary • Gather and organize the data • Utilize that information • Inform and empower your audience Contact me: Matthew Israelson misra@uw. edu
- Slides: 42