MANAGING AND SHARING RESEARCH DATA Ara Tahmassian Chief

  • Slides: 45
Download presentation
MANAGING AND SHARING RESEARCH DATA Ara Tahmassian, Chief Research Compliance Officer Mercè Crosas, Chief

MANAGING AND SHARING RESEARCH DATA Ara Tahmassian, Chief Research Compliance Officer Mercè Crosas, Chief Data Science Officer, at IQSS Harvard Responsible Conduct of Research, January 15, 2016

"Research data management concerns the organization of data, from its entry to the research

"Research data management concerns the organization of data, from its entry to the research cycle through the dissemination and archiving of valuable results. It aims to ensure reliable verification of results, and permits new and innovative research built on existing information. ” Whyte, A. , Tedds, J. (2011). ‘Making the Case for Research Data Management’. DCC Briefing Papers. Edinburgh: Digital Curation Centre

Why should you care about managing and sharing your research data? • Helps you

Why should you care about managing and sharing your research data? • Helps you reuse your own data • Facilitates reliable verification of results by others • Permits new research built on existing data • Fulfills data management plans required by federal funding agencies and foundations • Lets you make public assets available to the public • Allows you to publish datasets along with scholarly article, as now required by many leading journals

Data Acquisition, Management Data Planning • Research scope • Data size, format, lifespan •

Data Acquisition, Management Data Planning • Research scope • Data size, format, lifespan • Metadata options • Compliance Data Collection, Analysis, Org • Computing Infrastructure • Analysis Software • Data assessment • Provenance Data Sharing and Archiving Dissemination Preservation • Findings • Data • Code • Other research outputs • Open access • Repositories • Archival Tools • Short term vs long term Research Data Management Cycle

Harvard Policies and Practices • Human Subjects Data – the IRB • Research with

Harvard Policies and Practices • Human Subjects Data – the IRB • Research with Animals – the IACUC • Retention Policy: 7 years and exceptions • Harvard Research Data Security Policy

Funders Requirements • Federal: NSF, NIH, … require public access plans • Foundations: Sloan,

Funders Requirements • Federal: NSF, NIH, … require public access plans • Foundations: Sloan, Gates (open data policy) – and public access plans • Most plans must be created and submitted as part of the funding application process (DMP) • Plan ahead!

Federal and State Regulations • HIPAA (18+ identifiers – alone or in combination data

Federal and State Regulations • HIPAA (18+ identifiers – alone or in combination data sets) o Informed Consent (IRB, secondary use of data – HIPAA waivers) • FERPA (education information and special protections) • MA residence state law • Stem Cell data and Genomics data must be published in approved repository, but also must be de-identified.

Third Party Data Restrictions • Third-party data (you did not collect the data; you

Third Party Data Restrictions • Third-party data (you did not collect the data; you are accessing existing data that was gathered for other purposes) • Data user agreements (any contract, even an invoice can have terms that can restrict your IP or publication rights) • Licensing agreements – can also include restrictions

Data Planning Data Collection, Analysis, Org • IRB • Office of Vice Provost of

Data Planning Data Collection, Analysis, Org • IRB • Office of Vice Provost of Research • Office of Sponsored Programs • Information Security • DMP Tool • Statistical/Data Science Consultants Dissemination Preservation • Harvard Dataverse repository • Open Data Assistance Program • DASH: Open Access Publications • Library resources • Astrophysics Data Services (ADS) Resources at Harvard http: //vpr. harvard. edu

Why Data (and Code) Sharing?

Why Data (and Code) Sharing?

“Trust, but verify” “Ideally, research protocols should be registered in advance and monitored in

“Trust, but verify” “Ideally, research protocols should be registered in advance and monitored in virtual notebooks. ” “Where possible, trial data also should be open for other researchers to inspect and test. ”

Science might be imperfect, but is self-correcting “Instances in which scientists detect and address

Science might be imperfect, but is self-correcting “Instances in which scientists detect and address flaws in work constitute evidence of success, not failure. ” “Ensuring that the integrity of science is protected is the responsibility of many stakeholders. ”

Science isn’t broken, it’s just difficult “…data can be narrowed and expanded (p-hacked) to

Science isn’t broken, it’s just difficult “…data can be narrowed and expanded (p-hacked) to make either hypothesis appear correct. That’s because answering even a simple scientific question requires lots of choices that can shape the results. This doesn’t mean that science is unreliable. It just means that it’s more challenging than we sometimes give it credit for. ”

Data and code sharing facilitates verification and self-correction in science

Data and code sharing facilitates verification and self-correction in science

But research data are not always easily accessible Rarely can access data from published

But research data are not always easily accessible Rarely can access data from published work Most data are not too big to share Size of data Instead ask colleagues for data Not enough standards for sharing data Science 11 February 2011: Vol. 331 no. 6018 pp. 692 -693 DOI: 10. 1126/science. 331. 6018. 692

Data Sharing at Harvard Dataverse: dataverse. harvard. edu Started as a community repository for

Data Sharing at Harvard Dataverse: dataverse. harvard. edu Started as a community repository for Social Science Now open to all research fields and all researchers More than 1400 dataverses More than 60, 000 datasets More than 1, 500, 000 downloads

Data sharing is also good for you You Get credit for your data Publishers

Data sharing is also good for you You Get credit for your data Publishers and Journals Verify published work Federal funding agencies Make public assets accessible Science Validate, reuse and extend previous work

Sharing Data Increases Citations From 10, 555 studies with gene expression microarray data: -

Sharing Data Increases Citations From 10, 555 studies with gene expression microarray data: - Studies that shared data received 9% more citations - Data reuse by third-party investigators continued for 6 years Piwowar and Vision (2013), Data reuse and the open data citation advantage. Peer. J 1: e 175; DOI 10. 7717/peerj. 175

Percentage of broken links to data Dataverse solves the problem of invalid links Over

Percentage of broken links to data Dataverse solves the problem of invalid links Over time, links to data become invalid Analysis of 7, 641 Publications from 4 major journals in Astronomy and Astrophysics, between 1997 and 2008 Pepe, Goodman, Muench, Crosas, Erdmann, 2014 “Sharing, Archiving and Citing Data in Astronomy” PLOSOne

A Dataverse is a container of Datasets Each Dataverse can be for a researcher,

A Dataverse is a container of Datasets Each Dataverse can be for a researcher, a research project, a department, a journal, or a larger organization.

Dataverse provides a rich set of features Access Control & Roles Credit and Visibility

Dataverse provides a rich set of features Access Control & Roles Credit and Visibility Discovery • Standard, persistent data citation • Branding for each dataverse • Widgets to embed in your own website • Faceted search for all metadata • Standard metadata: • citation • scientific domain • file-level • CCO waiver for public datasets • Tiered access: • terms of use • guestbook • restricted data • Publishing workflow • Multiple roles: • contribute • curate, review • administrate Data Features • Versioning • Conversion of tabular data files to standard format • Automatic extraction of file metadata (R, STATA, SPSS, XSD, FITS)

Who uses Dataverse and what for? Election Data Archive (Steve Ansolabehere, Government Department) Robert

Who uses Dataverse and what for? Election Data Archive (Steve Ansolabehere, Government Department) Robert Sampson’s Data (Sociology Department) Ebola Data (Pardis Sabeti, Department of Organismic and Evolutionary Biology) Supernova Data (Alicia Soberderg, Astronomy Department, Cf. A)

Sharing Sensitive Data

Sharing Sensitive Data

Harvard Dataverse Terms of Use: Dataset needs to be de-identified “User Uploads must be

Harvard Dataverse Terms of Use: Dataset needs to be de-identified “User Uploads must be void of all identifiable information, such that re-identification of any subjects from the amalgamation of the information available from all of the materials (across datasets and dataverses) uploaded under any one author and/or user should not be possible. ”

Anonymized data is increasingly reidentifiable • Sweeney (2000) showed that 87% of all Americans

Anonymized data is increasingly reidentifiable • Sweeney (2000) showed that 87% of all Americans could be uniquely identified using only three bits of information: ZIP code, birthdate, and sex. • In 2013, Sweeney also showed that combining the Washington State Health Database with news (accidents, hospitalized people) could re-identify 43% of the records.

Data. Tags versus Harvard Security Levels Data. Tags Harvard Security Levels Level 1: No

Data. Tags versus Harvard Security Levels Data. Tags Harvard Security Levels Level 1: No sensitive data; open data Level 1: De-identified data Level 2: Confidential information by University standards; no material harm Level 3: Confidential information that could cause material harm (non-level 4 FERPA) Level 4: High-risk confidential information (SSN) Level 5* (Level 4. 5, on the network) Information that would cause severe harm

Future Data. Tags workflow in Dataverse Automatic Interview Data File Ingestion Review Board Approval

Future Data. Tags workflow in Dataverse Automatic Interview Data File Ingestion Review Board Approval Direct Access Sensitive Dataset Two-factor Authentication; Signed DUA Privacy Preserving Access http: //datatags. org http: //privacytools. seas. harvard. edu

Questions

Questions

Collaboration in Research

Collaboration in Research

Introduction • Research is becoming interdisciplinary. • Funding agencies are moving towards funding more

Introduction • Research is becoming interdisciplinary. • Funding agencies are moving towards funding more translational research in every field. • Large volumes of data (Big Data) are being generated which requires specialists from multiple disciplines to collaborate in analyzing the data and using the data beneficially. • Collaborations are easier to develop and sustain than they were before.

What is Collaborative Research Collaborative research can be defined as research that is conducted

What is Collaborative Research Collaborative research can be defined as research that is conducted by more than one researcher, or research team, either within their institution or with colleagues in other institutions towards a common goal.

Establishing Collaborative Relationships Successful collaboration requires that: • All participating members work together towards

Establishing Collaborative Relationships Successful collaboration requires that: • All participating members work together towards a common goal that has been agreed by all parties. • Each member of the team is considered an important part of the team and that they understands their role and the expectations from their activity. • The interactions are based on trust, respect, good communication, and the ability to compromise.

Establishing Collaborative Relationships Like any other agreement there are basic details that need to

Establishing Collaborative Relationships Like any other agreement there are basic details that need to be considered and agreed upon prior to the start of work to make the project beneficial for all parties involved in the collaboration. • What are the common goals? • What is to be exchanged through the collaboration? • How will the work and any products be shared? • How will any funds available be shared and spent? • How is responsibility (i. e. who does what) and credit (e. g. publication) shared? • What are the timelines for each goal?

Negotiating Collaboration Agreement Negotiating agreements is essentially the art of compromise. In addition to

Negotiating Collaboration Agreement Negotiating agreements is essentially the art of compromise. In addition to your research goals there are other institutional and regulatory requirements that each participating team has to address. When negotiating agreements it is best to address as many of the critical elements prior to starting the project; postponing issues on which there is considerable disagreement to a later date is unlikely to be resolved easily.

Communication amongst the team members a critical factor in successful collaboration. • Communicate often

Communication amongst the team members a critical factor in successful collaboration. • Communicate often and early throughout the project. • Establish a formal communication structure for reporting on the progress, discussing problems or challenges, etc. A key component of your communication strategy should include the notification of team members if you encounter any problems or challenges in completing your part of the task as soon as possible.

Roles and Responsibilities It is important to have a clear understanding and agreement on

Roles and Responsibilities It is important to have a clear understanding and agreement on the roles and responsibilities of each member. This should include a clear understanding of the: • Leadership: There should be a clear understanding by all collaborators as to who is ultimately responsible for leading the project and what are the powers vested on this individual, or individuals if there are co-leaders. • Team Members: similarly there should be clearly defined roles and expectation for each team member. The specific roles would depend on the scope of the project as well as the expertise of the individual.

Discuss Authorship Plans Discuss issues related to publications arising from the collaborative research in

Discuss Authorship Plans Discuss issues related to publications arising from the collaborative research in advance and document the agreements. These may include: • The right to publish (including thesis • • • publications) Who writes what Who makes presentations of the results (e. g. in conferences) The order of the authors Acknowledgment of those who are not authors but have contributed to the research etc.

Intellectual Property Rights Intellectual Property (IP) rights are complex and specifics vary from one

Intellectual Property Rights Intellectual Property (IP) rights are complex and specifics vary from one institutions or country to another. It is extremely important for the collaborators to discuss the IP issues with the responsible individuals at their institutions before entering into, or starting, a collaborative research to make sure that their rights are protected. The session on Intellectual Property Rights (IP) will cover the topic in more detail.

Data Management, Sharing, and Ownership Discuss and reach an agreement in advance on the

Data Management, Sharing, and Ownership Discuss and reach an agreement in advance on the details of: • • • How data is collected and managed to ensure their integrity. What is the ownership of the data (i. e. who owns what data? ). When and how is data shared amongst team members. Data security plans. Electronic data back-up; . Software used to record or manage the data.

Conflicts of Interest The research team members should clearly disclose any potential conflicts of

Conflicts of Interest The research team members should clearly disclose any potential conflicts of interest that may have, or may be perceived as having, influence on the integrity of the research and the data generated. The session on the Conflict of Interest provides more details on the topic.

Facilities, Equipment and Supplies • A collaborative agreement should clearly define the contributions of

Facilities, Equipment and Supplies • A collaborative agreement should clearly define the contributions of each member, or team, in terms of supplies, equipment and facilities. • Each collaborator, or team, should be realistic on the availability of facilities and equipment and consider other active, or planned, projects using the same facilities or equipment.

Conflict Resolution Conflicts, can, and do, arise during any collaborative projects due to :

Conflict Resolution Conflicts, can, and do, arise during any collaborative projects due to : • Different styles and personalities of the individuals involved. • Different approaches by different specialties in a multi-disciplinary research project. • Challenges encountered during the project and their causes. • Inadequate communications. • Pure misunderstanding. Regardless of the reason: talk about it and get a resolution when it first arises

Institutional and Regulatory Standards • Understand regulatory and institutional and funder’s mandates on both

Institutional and Regulatory Standards • Understand regulatory and institutional and funder’s mandates on both sides. • This issue is especially important in international collaborations where country specific regulatory mandates may differ significantly. • Often times the participants are required to adhere to the stricter regulation.

Final Thoughts • The benefits of collaboration are undeniable, and collaboration is in the

Final Thoughts • The benefits of collaboration are undeniable, and collaboration is in the best spirit of science. • Establishing a collaboration can leave scientists vulnerable to the actions — or inactions — of their collaborators. In choosing collaborators, trust and credibility are essential values. • Choosing collaborators must be based not only on scientific considerations, but also on the likelihood of a respectful, even amicable, relationship in which lines of communication can be kept open.

QUESTIONS

QUESTIONS