Data Data Everywhere But Not a Byte to
Data, Data Everywhere, But Not a Byte to Eat Michael F. Huerta, Ph. D. Associate Director, National Library of Medicine Director, Office of Health Information Programs Development BRDI/NAS 2/26/13
Biomedical Research Enterprise - Today n Lots and lots of data – in individual labs
Biomedical Research Enterprise - Today n n Lots and lots of data – in individual labs Few data broadly available to research community u Exceptions: genomic, human subject autism, particular research initiatives (e. g. , ADNI, Human Connectome Project)
Biomedical Research Enterprise - Today n n Lots and lots of data – in individual labs Few data broadly available to research community u n Exceptions: genomic, human subject autism, particular research initiatives (e. g. , ADNI, Human Connectome Project) For much of biomedical research enterprise u u Major public products: concepts in scientific papers, not data Biomedical research is concept-centric, not data-centric
Biomedical Research Enterprise - Tomorrow n Liberated data - increase data sharing
Biomedical Research Enterprise - Tomorrow n n Liberated data - increase data sharing Advances in relevant data science and data tools
Biomedical Research Enterprise - Tomorrow n n n Liberated data - increase data sharing Advances in relevant data science and data tools Ways to make data u u Discoverable Useful to others Citable Linked to scientific literature
Biomedical Research Enterprise - Tomorrow n n n Liberated data - increase data sharing Advances in relevant data science and data tools Ways to make data u u n Discoverable Useful to others Citable Linked to scientific literature Greater prominence of data in science & scholarship
Today Tomorrow
NIH Big Data to Knowledge Initiative for Research Data Today Tomorrow
NIH Big Data to Knowledge (BD 2 K) n Data and Informatics Working Group (DIWG) of the Advisory Committee to the Director of NIH u D De. Mets & L Tabak
NIH Big Data to Knowledge (BD 2 K) n Data and Informatics Working Group (DIWG) of the Advisory Committee to the Director of NIH u u D De. Mets & L Tabak Recommendations for NIH Research Data:
NIH Big Data to Knowledge (BD 2 K) n Data and Informatics Working Group (DIWG) of the Advisory Committee to the Director of NIH u u D De. Mets & L Tabak Recommendations for NIH Research Data: t t t Sharing & Standards Tools Workforce
NIH Big Data to Knowledge (BD 2 K) n Data and Informatics Working Group (DIWG) of the Advisory Committee to the Director of NIH u u D De. Mets & L Tabak Recommendations for NIH Research Data: t t t n Sharing & Standards Tools Workforce Implementation Groups u Eric Green (Acting Assoc Dir of NIH for Data Science) t t Sharing & Standards M Huerta & J Larkin Tools (Software Development) V Bonazzi & J Couch Tools+ (Centers) L Brooks, M Huerta, P Lyster & B Seto Workforce M Dunn
Sharing & Standards n Policies to increase data sharing and change the culture u Changes will liberate data
Sharing & Standards n Policies to increase data sharing and change the culture u Changes will liberate data n Frameworks for community-based standards efforts u Standards make data useful u Community-base promotes their use
Sharing & Standards n Policies to increase data sharing and change the culture u Changes will liberate data n Frameworks for community-based standards efforts u Standards make data useful u Community-base promotes their use n Catalog of data set information research ecosystem u Discoverable, citable, and linked to the literature
Sharing & Standards n Policies to increase data sharing and change the culture u Changes will liberate data n Frameworks for community-based standards efforts u Standards make data useful u Community-base promotes their use n Catalog of data set information research ecosystem u Discoverable, citable, and linked to the literature Each adds value
Sharing & Standards n Policies to increase data sharing and change the culture u Changes will liberate data n Frameworks for community-based standards efforts u Standards make data useful u Community-base promotes their use n Catalog of data set information research ecosystem u Discoverable, citable, and linked to the literature Each adds value Synergy together
NIH Data Catalog: A Use Case
NIH Data Catalog: A Use Case An NIH-funded investigator
NIH Data Catalog Just before submitting a scientific paper to a journal, investigator uploads minimal info about the data set to the NIH Data Catalog
NIH Data Catalog Minimal info includes: -Authors proper credit for data -Data descriptors (controlled) efficient search -Data locus, availability & way to access sharing
NIH Data Catalog Minimal info includes: -Authors proper credit for data -Data descriptors (controlled) efficient search -Data locus, availability & way to access sharing Upload generates: -Data publication citation -Data Unique IDentifier (DUID)
NIH Data Catalog Data Unique IDentifier (DUID) is sent to the investigator
NIH Data Catalog Investigator submits manuscript to the scientific journal - with DUID in abstract & data publication cited in manuscript
NIH Data Catalog Journal paper is published & indexed in Pub. Med
NIH Data Catalog Pub. Med pulls DUID from abstract as a separate data element in the Pub. Med citation
NIH Data Catalog Data publication is also sent to Pub. Med for indexing
NIH Data Catalog Pub. Med also pulls DUID from data publication as an element of Pub. Med citation
DUID now in Pub. Med citations of both the scientific publication NIH & the data publication Data Catalog forming a 2 -way link
NIH Data Catalog Pub. Med uses same data descriptors as data publication for indexing data publication
NIH Data Catalog Use of same controlled Terms in catalog and Pub. Med provides discoverability of info about data sets
NIH Data Catalog DUIDs, citations of data publications & scientific publications can be used in NIH administrative systems
Bringing Data into the Research Ecosystem
Bringing Data into the Research Ecosystem n Data more available (policies) & useful (standards)
Bringing Data into the Research Ecosystem n n Data more available (policies) & useful (standards) Data sets are discoverable: u Same descriptors of data sets used in data catalog are used as index and search terms in Pub. Med
Bringing Data into the Research Ecosystem n n n Data more available (policies) & useful (standards) Data sets are discoverable: u Same descriptors of data sets used in data catalog are used as index and search terms in Pub. Med Data sets are citable: u NIH Data Catalog produces citable data publications u Citability + proper credit incentives related to data
Bringing Data into the Research Ecosystem n n Data more available (policies) & useful (standards) Data sets are discoverable: u Same descriptors of data sets used in data catalog are used as index and search terms in Pub. Med Data sets are citable: u NIH Data Catalog produces citable data publications u Citability + proper credit incentives related to data Data sets are linked with the literature u Common search & retrieval approach for scientific publications and data publications through Pub. Med u Use of DUID for direct, two-way linkage
Bringing Data into the Research Ecosystem n n n Data more available (policies) & useful (standards) Data sets are discoverable: u Same descriptors of data sets used in data catalog are used as index and search terms in Pub. Med Data sets are citable: u NIH Data Catalog produces citable data publications u Citability + proper credit incentives related to data Data sets are linked with the literature u Common search & retrieval approach for scientific publications and data publications through Pub. Med u Use of DUID for direct, two-way linkage Information in ecosystem - use by NIH and 3 rd parties u Trend analysis, etc.
- Slides: 42