MANAGING AND SHARING RESEARCH DATA Ara Tahmassian Chief
- Slides: 45
MANAGING AND SHARING RESEARCH DATA Ara Tahmassian, Chief Research Compliance Officer Mercè Crosas, Chief Data Science Officer, at IQSS Harvard Responsible Conduct of Research January 11, 2017
"Research data management concerns the organization of data, from its entry to the research cycle through the dissemination and archiving of valuable results. It aims to ensure reliable verification of results, and permits new and innovative research built on existing information. ” Whyte, A. , Tedds, J. (2011). ‘Making the Case for Research Data Management’. DCC Briefing Papers. Edinburgh: Digital Curation Centre
Why should you care about managing and sharing your research data? • Helps you reuse your own data • Facilitates reliable verification of results by others • Permits new research built on existing data • Fulfills data management plans required by federal funding agencies and foundations • Lets you make public assets available to the public • Allows you to publish datasets along with scholarly article, as now required by many leading journals
Data Acquisition, Management Data Planning • Research scope • Data size, format, lifespan • Metadata options • Compliance Data Collection, Analysis, Org • Computing Infrastructure • Analysis Software • Data assessment • Provenance Data Sharing and Archiving Dissemination Preservation • Findings • Data • Code • Other research outputs • Open access • Repositories • Archival Tools • Short term vs long term Research Data Management Cycle
Harvard Policies and Practices • Human Subjects Data – the IRB • Research with Animals – the IACUC • Retention Policy: 7 years and exceptions • Harvard Research Data Security Policy
Funders Requirements • Federal: NSF, NIH, … require public access plans • Foundations: Sloan, Gates (open data policy) – and public access plans • Most plans must be created and submitted as part of the funding application process (DMP) • Plan ahead!
Federal and State Regulations • HIPAA (18+ identifiers – alone or in combination data sets) o Informed Consent (IRB, secondary use of data – HIPAA waivers) • FERPA (education information and special protections) • MA residence state law • Stem Cell data and Genomics data must be published in approved repository, but also must be de-identified.
Third Party Data Restrictions • Third-party data (you did not collect the data; you are accessing existing data that was gathered for other purposes) • Data user agreements (any contract, even an invoice can have terms that can restrict your IP or publication rights) • Licensing agreements – can also include restrictions
Data Planning Data Collection, Analysis, Org • IRB • Office of Vice Provost of Research • Office of Sponsored Programs • Information Security • DMP Tool • Statistical/Data Science Consultants Dissemination Preservation • Harvard Dataverse repository • Open Data Assistance Program • DASH: Open Access Publications • Library resources • Astrophysics Data Services (ADS) Resources at Harvard http: //vpr. harvard. edu
Why Data (and Code) Sharing?
“Trust, but verify” “Ideally, research protocols should be registered in advance and monitored in virtual notebooks. ” “Where possible, trial data also should be open for other researchers to inspect and test. ”
Science might be imperfect, but is self-correcting “Instances in which scientists detect and address flaws in work constitute evidence of success, not failure. ” “Ensuring that the integrity of science is protected is the responsibility of many stakeholders. ”
Science isn’t broken, it’s just difficult “…data can be narrowed and expanded (p-hacked) to make either hypothesis appear correct. That’s because answering even a simple scientific question requires lots of choices that can shape the results. This doesn’t mean that science is unreliable. It just means that it’s more challenging than we sometimes give it credit for. ”
Data and code sharing facilitates verification and self-correction in science
But research data are not always easily accessible Rarely can access data from published work Most data are not too big to share Size of data Instead ask colleagues for data Not enough standards for sharing data Science 11 February 2011: Vol. 331 no. 6018 pp. 692 -693 DOI: 10. 1126/science. 331. 6018. 692
Data Sharing at Harvard Dataverse: dataverse. harvard. edu Started as a community repository for Social Science Now open to all research fields and all researchers More than 1400 dataverses More than 60, 000 datasets More than 1, 500, 000 downloads
Data sharing is also good for you You Get credit for your data Publishers and Journals Verify published work Federal funding agencies Make public assets accessible Science Validate, reuse and extend previous work
Sharing Data Increases Citations From 10, 555 studies with gene expression microarray data: - Studies that shared data received 9% more citations - Data reuse by third-party investigators continued for 6 years Piwowar and Vision (2013), Data reuse and the open data citation advantage. Peer. J 1: e 175; DOI 10. 7717/peerj. 175
Percentage of broken links to data Dataverse solves the problem of invalid links Over time, links to data become invalid Analysis of 7, 641 Publications from 4 major journals in Astronomy and Astrophysics, between 1997 and 2008 Pepe, Goodman, Muench, Crosas, Erdmann, 2014 “Sharing, Archiving and Citing Data in Astronomy” PLOSOne
A Dataverse is a container of Datasets Each Dataverse can be for a researcher, a research project, a department, a journal, or a larger organization.
Dataverse provides a rich set of features Access Control & Roles Credit and Visibility Discovery • Standard, persistent data citation • Branding for each dataverse • Widgets to embed in your own website • Faceted search for all metadata • Standard metadata: • citation • scientific domain • file-level • CCO waiver for public datasets • Tiered access: • terms of use • guestbook • restricted data • Publishing workflow • Multiple roles: • contribute • curate, review • administrate Data Features • Versioning • Conversion of tabular data files to standard format • Automatic extraction of file metadata (R, STATA, SPSS, XSD, FITS)
Who uses Dataverse and what for? Election Data Archive (Steve Ansolabehere, Government Department) Robert Sampson’s Data (Sociology Department) Ebola Data (Pardis Sabeti, Department of Organismic and Evolutionary Biology) Supernova Data (Alicia Soberderg, Astronomy Department, Cf. A)
Sharing Sensitive Data
Harvard Dataverse Terms of Use: Dataset needs to be de-identified “User Uploads must be void of all identifiable information, such that re-identification of any subjects from the amalgamation of the information available from all of the materials (across datasets and dataverses) uploaded under any one author and/or user should not be possible. ”
Anonymized data is increasingly reidentifiable • Sweeney (2000) showed that 87% of all Americans could be uniquely identified using only three bits of information: ZIP code, birthdate, and sex. • In 2013, Sweeney also showed that combining the Washington State Health Database with news (accidents, hospitalized people) could re-identify 43% of the records.
Data. Tags versus Harvard Security Levels Data. Tags Harvard Security Levels Level 1: No sensitive data; open data Level 1: De-identified data Level 2: Confidential information by University standards; no material harm Level 3: Confidential information that could cause material harm (non-level 4 FERPA) Level 4: High-risk confidential information (SSN) Level 5* (Level 4. 5, on the network) Information that would cause severe harm
Future Data. Tags workflow in Dataverse Automatic Interview Data File Ingestion Review Board Approval Direct Access Sensitive Dataset Two-factor Authentication; Signed DUA Privacy Preserving Access http: //datatags. org http: //privacytools. seas. harvard. edu
Questions
Collaboration in Research
Introduction • Research is becoming interdisciplinary. • Funding agencies are moving towards funding more translational research in every field. • Large volumes of data (Big Data) are being generated which requires specialists from multiple disciplines to collaborate in analyzing the data and using the data beneficially. • Collaborations are easier to develop and sustain than they were before.
What is Collaborative Research Collaborative research can be defined as research that is conducted by more than one researcher, or research team, either within their institution or with colleagues in other institutions towards a common goal.
Establishing Collaborative Relationships Successful collaboration requires that: • All participating members work together towards a common goal that has been agreed by all parties. • Each member of the team is considered an important part of the team and that they understands their role and the expectations from their activity. • The interactions are based on trust, respect, good communication, and the ability to compromise.
Establishing Collaborative Relationships Like any other agreement there are basic details that need to be considered and agreed upon prior to the start of work to make the project beneficial for all parties involved in the collaboration. • What are the common goals? • What is to be exchanged through the collaboration? • How will the work and any products be shared? • How will any funds available be shared and spent? • How is responsibility (i. e. who does what) and credit (e. g. publication) shared? • What are the timelines for each goal?
Negotiating Collaboration Agreement Negotiating agreements is essentially the art of compromise. In addition to your research goals there are other institutional and regulatory requirements that each participating team has to address. When negotiating agreements it is best to address as many of the critical elements prior to starting the project; postponing issues on which there is considerable disagreement to a later date is unlikely to be resolved easily.
Communication amongst the team members a critical factor in successful collaboration. • Communicate often and early throughout the project. • Establish a formal communication structure for reporting on the progress, discussing problems or challenges, etc. A key component of your communication strategy should include the notification of team members if you encounter any problems or challenges in completing your part of the task as soon as possible.
Roles and Responsibilities It is important to have a clear understanding and agreement on the roles and responsibilities of each member. This should include a clear understanding of the: • Leadership: There should be a clear understanding by all collaborators as to who is ultimately responsible for leading the project and what are the powers vested on this individual, or individuals if there are co-leaders. • Team Members: similarly there should be clearly defined roles and expectation for each team member. The specific roles would depend on the scope of the project as well as the expertise of the individual.
Discuss Authorship Plans Discuss issues related to publications arising from the collaborative research in advance and document the agreements. These may include: • The right to publish (including thesis • • • publications) Who writes what Who makes presentations of the results (e. g. in conferences) The order of the authors Acknowledgment of those who are not authors but have contributed to the research etc.
Intellectual Property Rights Intellectual Property (IP) rights are complex and specifics vary from one institutions or country to another. It is extremely important for the collaborators to discuss the IP issues with the responsible individuals at their institutions before entering into, or starting, a collaborative research to make sure that their rights are protected. The session on Intellectual Property Rights (IP) will cover the topic in more detail.
Data Management, Sharing, and Ownership Discuss and reach an agreement in advance on the details of: • • • How data is collected and managed to ensure their integrity. What is the ownership of the data (i. e. who owns what data? ). When and how is data shared amongst team members. Data security plans. Electronic data back-up; . Software used to record or manage the data.
Conflicts of Interest The research team members should clearly disclose any potential conflicts of interest that may have, or may be perceived as having, influence on the integrity of the research and the data generated. The session on the Conflict of Interest provides more details on the topic.
Facilities, Equipment and Supplies • A collaborative agreement should clearly define the contributions of each member, or team, in terms of supplies, equipment and facilities. • Each collaborator, or team, should be realistic on the availability of facilities and equipment and consider other active, or planned, projects using the same facilities or equipment.
Conflict Resolution Conflicts, can, and do, arise during any collaborative projects due to : • Different styles and personalities of the individuals involved. • Different approaches by different specialties in a multi-disciplinary research project. • Challenges encountered during the project and their causes. • Inadequate communications. • Pure misunderstanding. Regardless of the reason: talk about it and get a resolution when it first arises
Institutional and Regulatory Standards • Understand regulatory and institutional and funder’s mandates on both sides. • This issue is especially important in international collaborations where country specific regulatory mandates may differ significantly. • Often times the participants are required to adhere to the stricter regulation.
Final Thoughts • The benefits of collaboration are undeniable, and collaboration is in the best spirit of science. • Establishing a collaboration can leave scientists vulnerable to the actions — or inactions — of their collaborators. In choosing collaborators, trust and credibility are essential values. • Choosing collaborators must be based not only on scientific considerations, but also on the likelihood of a respectful, even amicable, relationship in which lines of communication can be kept open.
QUESTIONS
- Ara tahmassian
- Ara ara beam
- Chapter 9 lesson 3 commander in chief and chief diplomat
- "chief telecom" and "data center"
- Chief data officer training
- Ibm chief data officer
- Komunikasi data merupakan gabungan dua macam teknik yaitu
- Sharing data
- Data sharing in dbms
- Te ara whakapiri toolkit
- Ara gallant
- Ach-chu'ara
- Ieca y ara 2
- Palabras con arra erre irri orro urru
- Temel inanç ara inanç otomatik düşünce örnekleri
- Ara ısıtmalı rankine çevrimi soru çözümü
- Termék levédése ára
- Ara competency framework
- Kare dik piramitin hacmi
- Te ara piko
- üst üste dizilmiş yassı keseciklerden oluşmuştur
- Ara aguiar
- Jehan ara pasha
- Mateus 24 ara
- Ara söz cümle dışı unsur mudur
- öntöző ára
- Bonapartist spain
- Iyi düşün iyi hisset
- Hapaitia te ara tika pumau
- Transzmittancia mértékegysége
- Aegis bpo johor
- Ara?t?rma geli?tirme i? ilanlar?
- Hüsamettin bulut termodinamik
- Ara sahanlıklı merdiven
- Kanal ara
- Ara knaian
- Shihomi ara aksoy
- กลูโคส
- Tellus ara pacis
- Ara risk assessment
- Nlarn
- Prof dr serkan yılmaz
- L
- Ara hayrapetyan
- Oct vizsgálat
- Asrix