Data Management Planning Linda Hasman Kathleen Fear Emily

  • Slides: 46
Download presentation
Data Management Planning Linda Hasman Kathleen Fear Emily Flagg January 30, 2014

Data Management Planning Linda Hasman Kathleen Fear Emily Flagg January 30, 2014

NIH Data Management Plans • NIH requires a Data Management Plan for grants $500,

NIH Data Management Plans • NIH requires a Data Management Plan for grants $500, 000 and over. – Key Points of NIH Data Management Plan • • • What data will be shared? Who will have access to the data? Where will the data to be shared be located? When will the data be shared? How will researchers locate and access the data?

NIH Data Management Plans NIH Data Plan Example: This application requests support to collect

NIH Data Management Plans NIH Data Plan Example: This application requests support to collect public-use data from a survey of more than 22, 000 Americans over the age of 50 every 2 years. Data products from this study will be made available without cost to researchers and analysts User registration is required in order to access or download files. As part of the registration process, users must agree to the conditions of use governing access to the public release data, including restrictions against attempting to identify study participants, destruction of the data after analyses are completed, reporting responsibilities, restrictions on redistribution of the data to third parties, and proper acknowledgement of the data resource. Registered users will receive user support, as well as information related to errors in the data, future releases, workshops, and publication lists. The information provided to users will not be used for commercial purposes, and will not be redistributed to third parties.

What is a DMP? A formal plan outlining how you will handle your data

What is a DMP? A formal plan outlining how you will handle your data throughout and after your project… …which is now required by many funders… …and which is a good idea anyhow, even if it’s not required.

Data Death Cycle Planned format migration Good documentation Deposited to an archive

Data Death Cycle Planned format migration Good documentation Deposited to an archive

DMPs… …prevent data loss over time …ensure data integrity throughout a project …increase transparency

DMPs… …prevent data loss over time …ensure data integrity throughout a project …increase transparency …enable reuse

Goals of this session • Learn the basic components of a DMP • Understand

Goals of this session • Learn the basic components of a DMP • Understand how good data management practices translate to a good DMP

One size does not fit all… • But we’ll cover general guidelines

One size does not fit all… • But we’ll cover general guidelines

Data products: What data will be shared? Describe the kind of data you’re collecting

Data products: What data will be shared? Describe the kind of data you’re collecting or using, whether it’s digital… …or physical. (or all of the above)

Data Products: What to specify • What are your data products, both primary and

Data Products: What to specify • What are your data products, both primary and derived? • When will you collect / produce each data product? • How much data will you generate? • What file types? Open or proprietary?

Reuse and Sharing: Who will have access to the data?

Reuse and Sharing: Who will have access to the data?

A DMP does NOT: Require that you share all data with anyone who wants

A DMP does NOT: Require that you share all data with anyone who wants it “at no more than incremental cost and within a reasonable time” (NSF) “indicate the criteria for deciding who can receive your data” (NIH)

Reuse and sharing: what to specify • What data products will you share freely?

Reuse and sharing: what to specify • What data products will you share freely? When? How? – Data necessary for replication of public results – Other data? • What data products won’t you share freely? Why not? • Consider restrictions, embargo, etc. for data that can’t be immediately shared freely

Reuse and sharing: licensing • Raw data is not copyrightable in the US –

Reuse and sharing: licensing • Raw data is not copyrightable in the US – CCZero – ODC-PDDL • Other materials may be – CC licenses

Archiving and Preservation: Where will the data to be shared be located? Plan ahead

Archiving and Preservation: Where will the data to be shared be located? Plan ahead for what will happen to your data long-term, beyond its current use in your project.

Archiving and Preservation • Where will you put your data? • What will you

Archiving and Preservation • Where will you put your data? • What will you save and what will you discard? • How will you plan for ongoing usability? …format migration? …integrity checking and refreshing? …maintaining security?

Placing data in a repository • • Long-term commitment to data preservation Higher visibility

Placing data in a repository • • Long-term commitment to data preservation Higher visibility for your data Permanent URL / DOI enables data citation Reuse tracking and usage statistics

Placing data in a repository • UR Research: https: //urresearch. rochester. edu/home. action –

Placing data in a repository • UR Research: https: //urresearch. rochester. edu/home. action – Example: STOP-ROP Clinical Trial

 • • Library-hosted 2 GB soft limit Backed up, secure Free!

• • Library-hosted 2 GB soft limit Backed up, secure Free!

Placing data in a repository • UR Research: https: //urresearch. rochester. edu/home. action •

Placing data in a repository • UR Research: https: //urresearch. rochester. edu/home. action • Repository directories: re 3 data. org; biosharing. org

 • Integration with journal submission processes • Link to data held elsewhere •

• Integration with journal submission processes • Link to data held elsewhere • Not free: $80/submission

Description & Organization: How will you ensure that your data is usable? Describing and

Description & Organization: How will you ensure that your data is usable? Describing and organizing your data makes your work easier, and provides context for those you share with

Description & Organization • Information about data processing, collection details: the ‘story’ of the

Description & Organization • Information about data processing, collection details: the ‘story’ of the data (…but it’s all in the paper!) • Are your variable names meaningful? It is clear how different parts of the dataset relate to each other? Is it in a format others can use?

Description & Organization • Naming standards: – Can you tell what a file is

Description & Organization • Naming standards: – Can you tell what a file is and what it contains without opening it? How do your files relate to one another?

Description & Organization • Metadata: Contextualizing information about an object, physical or digital •

Description & Organization • Metadata: Contextualizing information about an object, physical or digital • Some fields have defined standards; some repositories ask for a specific set of metadata

Metadata • Where does it go? Lab notebook, Codebook, readme. txt, XML file

Metadata • Where does it go? Lab notebook, Codebook, readme. txt, XML file

A little help: DMPTool dmptool. org

A little help: DMPTool dmptool. org

A little help: UR Data Management website library. rochester. edu/data-management/goals

A little help: UR Data Management website library. rochester. edu/data-management/goals

Data Management Plans Considerations for the Protection of Research Subjects

Data Management Plans Considerations for the Protection of Research Subjects

Data Management Plans – Why? 1) Data Security 2) Data Monitoring

Data Management Plans – Why? 1) Data Security 2) Data Monitoring

Data Management Plans – Why? 1) Data Security § Prevent data loss, data integrity

Data Management Plans – Why? 1) Data Security § Prevent data loss, data integrity for reporting results, limit access, enable appropriate sharing § Belmont Report Ø “Beneficence”: to maximize possible benefits and minimize possible harms to human subjects § Risks and benefits may be reflected in protections for privacy and confidentiality…… Ø Data security policies and protections are essential to proper conduct of research and to protect the welfare of research subjects

Data Management Plans – Why? 2) Data Monitoring § § Accurate data collection, monitor

Data Management Plans – Why? 2) Data Monitoring § § Accurate data collection, monitor for inconsistencies Compliance Evaluate progress of the study Assess subject safety Ø Serious adverse events, trends § Stopping rules § Needs vary based on type of study and level of risk

Definitions • Personal identifiers are any data elements that alone or in combination could

Definitions • Personal identifiers are any data elements that alone or in combination could be used to identify an individual (e. g. , ss #, name, address, med rec #, DOB, demographic information, disease type) • Private health information includes information about behavior that occurs in a context in which an individual can reasonably expect that no observation or recording is taking place, and information provided for specific purposes by an individual which the individual can reasonably expect will not be made public

Definitions • Individually identifiable health information is any information that relates to the past,

Definitions • Individually identifiable health information is any information that relates to the past, present, or future medical condition of an individual and that identifies the person, or the information may be used such that the identity of the individual may be known (e. g. , via direct identifiers or indirectly through linked codes) • Protected health information (PHI) is identifiable health information that can be linked to an individual and may be used or disclosed to others only under certain conditions (i. e. 18 HIPAA identifiers)

Definitions • Human research data set contains informational elements, facts, and statistics about a

Definitions • Human research data set contains informational elements, facts, and statistics about a living individual obtained for research purposes and can be identifiable • De-identified data set includes information that has been stripped of all elements that might enable identification of an individual (i. e. , direct and indirect) • Coded data set includes information that has been stripped of all personal identifiers and assigned a random code unique to each subject that may or may not then be used to link data elements to an identity-only data set • Identity only data set contains personal identifiers necessary to conduct the research and the key to the identity code that may be used to link or merge personal identifiers to the coded data set

General Considerations • What is the minimum identifying data needed? • Who will have

General Considerations • What is the minimum identifying data needed? • Who will have access to the data? • Can the data be de-identified as soon as possible after collection and/or a separate coded and identity-only data set used?

General Considerations • When will raw data be destroyed? As soon as study completed

General Considerations • When will raw data be destroyed? As soon as study completed or according to institutional record retention policies? • How are physical data records protected versus electronic data records? – Locked file cabinets, limited access, swipe cards, PIN keypads, encryption

Electronic Data – Use and Storage • When using on-line data collection methods, be

Electronic Data – Use and Storage • When using on-line data collection methods, be aware of stored IP addresses, data access by third party, and the site’s data security policy • Types of Storage – Hard drive, server, database, shared drive, network drive, CD, DVD, *Cloud, USB *Acceptable cloud storage method When considering data storage methods…………

Electronic Data – Use and Storage • Use of Secure and UR approved storage

Electronic Data – Use and Storage • Use of Secure and UR approved storage methods – Dropbox is NOT an approved medium for data storage at URMC – HIPAA Highlights: Risk of Storing PHI in the Cloud – Encryption required if identifiable health information is stored (i. e. , email, DVD, USB/external drive, etc. ) – Recommended practices to ensure your research data remains your data – Consult with Information Technology!

HIPAA & Privacy • Applicable to researchers within the covered entity or working with

HIPAA & Privacy • Applicable to researchers within the covered entity or working with a covered entity – i. e. URMC or Affiliates

Additional Resources • Privacy office and HIPAA policies – HIPAA Policy Manual http: //intranet.

Additional Resources • Privacy office and HIPAA policies – HIPAA Policy Manual http: //intranet. urmc-sh. rochester. edu/policy/HIPAA/Policy. Manual/ – Research Activities Forms and Guidance http: //intranet. urmc-sh. rochester. edu/policy/HIPAA/Research. asp • Information Systems – URMC Information Security Policies https: //intranet. urmc-sh. rochester. edu/policy/hipaa/Policy. Manual/#sec – UR Data Classification http: //www. rochester. edu/it/policy/assets/pdf/INFORMATION_TECHNOLOGY_POLICY. pdf

Additional Resources • Office for Human Subject Protection (OHSP) RSRB Data & Safety Monitoring

Additional Resources • Office for Human Subject Protection (OHSP) RSRB Data & Safety Monitoring Plan policy http: //www. rochester. edu/ohsp/documents/ohsp/pdf/policies. And. Guidance/DSMP_Policy. pdf • RSRB Procedures for Sending/Receiving Data/Specimens http: //www. rochester. edu/ohsp/documents/ohsp/pdf/policies. And. Guidance/Procedures_for_Sending_Receiving_Data_Specimens. pdf

 • Data questions or concerns? Call me! (Or email, or drop by. )

• Data questions or concerns? Call me! (Or email, or drop by. ) Kathleen Fear 5 -6882 Carlson 313 E kfear@library. rochester. edu • At URMC, contact: Linda Hasman 5 -3399 Linda_Hasman@urmc. rochester. edu Donna Berryman 5 -6877 Donna_Berryman@urmc. rochester. edu • For IRB and human subjects concerns, contact: Emily Flagg 6 -5537 Emily_Flagg@urmc. rochester. edu