i SHARE Lets Share to Learn More INDEPTH

  • Slides: 19
Download presentation
i. SHARE – Let’s Share to Learn More INDEPTH Data Sharing Initiatives By Team

i. SHARE – Let’s Share to Learn More INDEPTH Data Sharing Initiatives By Team i. SHARE INDEPTH Network 1

i. SHARE – Let’s Share to Learn More Presentation Agenda • Data Sharing Initiatives

i. SHARE – Let’s Share to Learn More Presentation Agenda • Data Sharing Initiatives • Data Sharing with INDEPTH – History, Purpose, Initiatives • Concept of the Data Repository • Data Extraction Methodology • The ETL process • The Application and the Process • Dynamic Reports • The Framework • Current Limitations and Challenges • Future plans • QA INDEPTH Network 2

i. SHARE – Let’s Share to Learn More Data Sharing Initiatives • INDEPTH Data

i. SHARE – Let’s Share to Learn More Data Sharing Initiatives • INDEPTH Data System (IDS) – Efforts so far led by Prof. Abraham Kobus Herbst – If funded would lead to Standard Data Management System (Open. DSS) + A web-based repository – This would greatly enhance cross-site data analysis • Data Documentation Initiative (DDI) Documenting data within INDEPTH sites using standard machine readable formats • Data Sharing on the web within INDEPTH Sites (i. SHARE) INDEPTH Network 3

i. SHARE – Let’s Share to Learn More Data Sharing History • Growing call

i. SHARE – Let’s Share to Learn More Data Sharing History • Growing call within the funding community & the scientific community for data to be shared • Some individual INDEPTH sites; (Agincourt, Africa Centre) had already started taking steps in the direction of sharing data documentation and/or actual data • In 2007, three INDEPTH HDSS sites in Asia (Vadu India, Kanchanaburi -Thailand & Wosera - Papua New Guinea) came together to share their data on a webbased repository, with funding from the INDEPTH Secretariat, and technical support from I 2 IT INDEPTH Network 4

i. SHARE – Let’s Share to Learn More Why Data Sharing? • To encourage

i. SHARE – Let’s Share to Learn More Why Data Sharing? • To encourage INDEPTH sites to share their data with the broader scientific community • To help bring about transparency in scientific inquiry and also allow for verification and refinement of findings, more economically and effectively • To encourage collaboration with other institutions and communities INDEPTH Network 5

i. SHARE – Let’s Share to Learn More i. SHARE Initiative • i. SHARE

i. SHARE – Let’s Share to Learn More i. SHARE Initiative • i. SHARE – INDEPTH Sharing and Accessing Repository • Funding from the Hewlett Foundation for expansion - to include three African sites • In response to call from Secretariat; Agincourt and Dikgale from South Africa and Magu from Tanzania joined this initiative, totalling to six HDSS sites on the platform • All participating sites submitted draft data to be used for development of the repository • New website (http: //www. indepth-ishare. org) beta launched in October 2009 and final to be launched in February 2010 INDEPTH Network 6

i. SHARE – Let’s Share to Learn More Concept of Data Repository • Standardized

i. SHARE – Let’s Share to Learn More Concept of Data Repository • Standardized and Harmonized dataset • Collect data from participating HDSS sites (Push / Pull Extraction) • Clean and transform datasets to standard format • Upload data to centralized database • Data Repository created! • Repeat cycle for addition of more datasets INDEPTH Network 7

i. SHARE – Let’s Share to Learn More Standardized Dataset • Five table –

i. SHARE – Let’s Share to Learn More Standardized Dataset • Five table – Base table: one record for each individual under observation – Pregnancy. Outcome: one record for each pregnancy experienced by a women under observation – Deaths: one record for each death that occurs under observation – In migrations: one record for each in migration into a location under observation – Out migrations: one record for each out migration from a location under observation INDEPTH Network 8

i. SHARE – Let’s Share to Learn More Potential Uses of the Dataset •

i. SHARE – Let’s Share to Learn More Potential Uses of the Dataset • Basic Demographic rate and statistic calculations. Can character the populations from each site – Person years calculations • Assessing vital registry systems with in the sites – Birth registration – Death registration • Other analysis of – Education – Occupation – Reason for migration INDEPTH Network 9

i. SHARE – Let’s Share to Learn More Dataset Structure • Individual level •

i. SHARE – Let’s Share to Learn More Dataset Structure • Individual level • PID uniquely identifies the individual • Event table link to Individuals • EID uniquely identifies an event • Event liked to household(locations) where they occur identified by HID • Social groups simplified to individual living at the same location (HID) • Pregnancies linked to mother. Live born children linked to mother in Individuals (base) table INDEPTH Network 10

i. SHARE – Let’s Share to Learn More The ETL Process Start Data Extraction

i. SHARE – Let’s Share to Learn More The ETL Process Start Data Extraction Store the data in dummy tables in Excel/Mysql format Remove errors in the data Enforce data standards (Ex: ICD-codes) Yes More data in future ? No Validation Test Failed Insert data into Error and table Integrity test Test Passed Load anonymized data into i. SHARE database using FTP protocol INDEPTH Network Stop 11

i. SHARE – Let’s Share to Learn More Data Extraction Methodology • Sites send

i. SHARE – Let’s Share to Learn More Data Extraction Methodology • Sites send data as per standardized dataset requirements (Push Method) – Sites send data in csv, xls, mdb, frm, scripts, etc formats over FTP or e. Mail • Sites upload data at specified location; application access that to populate repository (Pull Method) INDEPTH Network 12

i. SHARE – Let’s Share to Learn More Vadu HDSS Kanchanaburi HDSS The Application

i. SHARE – Let’s Share to Learn More Vadu HDSS Kanchanaburi HDSS The Application ETL Operations Client Wosera HDSS i. SHARE DB Agincourt HDSS Digkale HDSS ETL Operations i. Share Web Server Client Magu HDSS INDEPTH Network Client 13

The Process i. SHARE – Let’s Share to Learn More Start User Registration Login

The Process i. SHARE – Let’s Share to Learn More Start User Registration Login & Password Generated Send Download Request Accept/Reject Download Request by Committee Member Is Download Request Accepted ? Rejected Stop Accepted INDEPTH Network Download Data 14

i. SHARE – Let’s Share to Learn More The Framework Application Layer Registration Login

i. SHARE – Let’s Share to Learn More The Framework Application Layer Registration Login Download Request Approval Feedback Reporting Database Layer Site 1 Site 3 Site 2 Site n INDEPTH Network 15

i. SHARE – Let’s Share to Learn More Dynamic Reports • Reports generated on-the-fly

i. SHARE – Let’s Share to Learn More Dynamic Reports • Reports generated on-the-fly providing real-time data for faster analysis. • It dynamically loads data from the database • i. SHARE dynamic reports provides: – Customizable reports as per user needs – Sophisticated actionable information without exposing internal complex data structures • Example: Migration Reports – By Year – By Site – Drill Down to gender and generate bar. Line and pie charts for better visual simplicity INDEPTH Network 16

i. SHARE – Let’s Share to Learn More Current Limitations • Error findings on

i. SHARE – Let’s Share to Learn More Current Limitations • Error findings on datasets is manual process but cleaning is automated • Pull method of data extraction not yet implemented – a framework for this has to be developed INDEPTH Network 17

i. SHARE – Let’s Share to Learn More Challenges • Re-coding existing data into

i. SHARE – Let’s Share to Learn More Challenges • Re-coding existing data into agreed categories / standards come at significant costs and requires funding • Conflicting conditionality imposed by different parent institutions and funding agencies • Cost of maintaining the repository as versions and contributing sites increase • Defining policies for research data in repositories • Abuse of data downloaded INDEPTH Network 18

i. SHARE – Let’s Share to Learn More Thank You http: //www. indepth-ishare. org

i. SHARE – Let’s Share to Learn More Thank You http: //www. indepth-ishare. org INDEPTH Network 19