RESEARCH DATA MANAGEMENT INTRODUCTION Research Data Management Digital

  • Slides: 53
Download presentation
RESEARCH DATA MANAGEMENT

RESEARCH DATA MANAGEMENT

INTRODUCTION − Research Data Management − Digital data − Organisation, reusability, long-term storage (archiving)

INTRODUCTION − Research Data Management − Digital data − Organisation, reusability, long-term storage (archiving) − Planning and publication 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 2

WHAT IS RESEARCH DATA? − What kinds of Research Data do you know? −

WHAT IS RESEARCH DATA? − What kinds of Research Data do you know? − Observational Data − Experimental Data − Simulated Data − Derived Data − Reference Data − Big Data (picture: pixabay. com)

DEFINITION − Research Data is all information that is − collected − observed or

DEFINITION − Research Data is all information that is − collected − observed or − created in the course of scientific work with the aim to generate or confirm original research results (pictures: pixabay. com)

WHAT IS RESEARCH DATA MANAGEMENT? − What is Research Data Management? − Planned handling

WHAT IS RESEARCH DATA MANAGEMENT? − What is Research Data Management? − Planned handling of Research Data − During their whole life cycle (picture: pixabay. com)

WHAT IS RESEARCH DATA MANAGEMENT? "Research data management concerns the organisation of data, from

WHAT IS RESEARCH DATA MANAGEMENT? "Research data management concerns the organisation of data, from its entry to the research cycle through to the dissemination and archiving of valuable results. It aims to ensure reliable verification of results, and permits new and innovative research built on existing information. " (from, Whyte, A. , Tedds, J. (2011). ‘Making the Case for Research Data Management’. DCC Briefing Papers. Edinburgh: Digital Curation Centre. Available online) 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 6

WHAT IS RESEARCH DATA MANAGEMENT? − Research Data Management is part of the research

WHAT IS RESEARCH DATA MANAGEMENT? − Research Data Management is part of the research process, and aims to make the research process as efficient as possible, and meet expectations and requirements of the university, research funders, and legislation. − It concerns how you: − Create data and plan for its use, − Organise, structure, and name data, − Keep it – make it secure, provide access, store and back it up, − Find information resources, and share with collaborators and more broadly, publish and get cited. 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 7

RESEARCH DATA POLICIES

RESEARCH DATA POLICIES

RESEARCH DATA POLICIES − A research data policy describes the requirements for the handling

RESEARCH DATA POLICIES − A research data policy describes the requirements for the handling of research data, e. g. − Journal and publisher policies − Institutional policies − Subject-specific policies − Funder requirements 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 9

FAIR PRINCIPLES − Meanwhile often used and quite widely endorsed concept − (Research) Data

FAIR PRINCIPLES − Meanwhile often used and quite widely endorsed concept − (Research) Data should be FAIR − Findable − Accessible − Interoperable − Reusable 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 10

DATA MANAGEMENT PLANS

DATA MANAGEMENT PLANS

DATA MANAGEMENT PLANNING (picture: pixabay. com) − Data Management planning → Data Management Plan

DATA MANAGEMENT PLANNING (picture: pixabay. com) − Data Management planning → Data Management Plan (DMP) − Required by funding agencies (different extent) − DMP created before the start of a project, continuously actualised

WHAT IS A DMP? − Formal document − outlines how data are to be

WHAT IS A DMP? − Formal document − outlines how data are to be handled − during a research project − after the project is completed − Goal of a data management plan is to consider the many aspects of − data management, metadata generation, data preservation, and analysis − before the project begins − ensures that data are well-managed in the present, and prepared for preservation in the future (source: https: //en. wikipedia. org/w/index. php? title=Data_management_plan&oldid=834492966) 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 13

WHY? − What are the benefits of a having a. DMP? (picture: pixabay. com)

WHY? − What are the benefits of a having a. DMP? (picture: pixabay. com) 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 14

DATA MANAGEMENT PLAN − Comparison of funder requirements in Germany regarding Data Management Plans

DATA MANAGEMENT PLAN − Comparison of funder requirements in Germany regarding Data Management Plans (as of 18. 04. 2018) Funding Organisation 08. 06. 2021 Plan required? Submission with application? Content Updates? No, first plan within the first six months of project Contents of the Horizon 2020 Template Update, if there are significant changes and at the end of the project No EC Horizon 2020 Data Management Plan DFG Information about handling of research data Yes Contents of the Guidelines for Handling Research Data BMBF Sometimes DMP mandatory, depends on the program If mandatory, yes Content depends on the respective program No BMBF Bildungsforschung Data Management Plan Yes Contents of the checklist Yes Präsentationstitel/Autor/Veranstaltung 15

DATA MANAGEMENT PLAN − DMP schema (DCC*) Summary, Project-IDs, Name, institutional Policies What kind

DATA MANAGEMENT PLAN − DMP schema (DCC*) Summary, Project-IDs, Name, institutional Policies What kind of data? How colected? − Administrative Data − Data Collection What metadata? How created? Used standards? − Documentation and Metadata − Ethics and Legal Compliance Data sharing restricted? Anonymisation? Patents? How licenced for re-use? − Storage and Backup − Selection and Preservation Storage, responsibilities, risks, external access − Data Sharing − Responsibilities and Resources Who is responsible for the Data Management? What resources will be required? * Digital Curation Centre, UK (www. dcc. ac. uk) Which data should be retained or destroyed? Forseeable research uses? Long-term preservation, costs, responsibility With whom? When? Conditions, restrictions, dissemination

DMP TOOLS − DMPonline (UK) − https: //dmponline. dcc. ac. uk/ − DMPTool (US)

DMP TOOLS − DMPonline (UK) − https: //dmponline. dcc. ac. uk/ − DMPTool (US) − https: //dmptool. org/ − RDMO (Germany) − https: //rdmorganiser. github. io/

ORGANISATION AND STRUCTURE

ORGANISATION AND STRUCTURE

WHY? − Why is a structured handling of data important? (picture: pixabay. com) 08.

WHY? − Why is a structured handling of data important? (picture: pixabay. com) 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 19

ORGANISATION AND STRUCTURE − A structured procedure − Ensures reproducibility even after years −

ORGANISATION AND STRUCTURE − A structured procedure − Ensures reproducibility even after years − helps other scientists and yourself by manifesting naming conventions -> collaboration − Enables other people to work with your data − Makes data search easier and quicker − Prevents redundant work and data loss − Helps to identify the current state of work − Ensures machine-readability (picture: pixabay. com)

ORGANISATION AND STRUCTURE − Directory Structure (Example) Raw Data Aggregated Data Code Project Output

ORGANISATION AND STRUCTURE − Directory Structure (Example) Raw Data Aggregated Data Code Project Output General Poster Paper (Topic) Specific Poster

ORGANISATION AND STRUCTURE − Naming files − As long as needed, as short as

ORGANISATION AND STRUCTURE − Naming files − As long as needed, as short as possible − To ensure uniformity, you can fall back to the following naming components − Content − Creator − Creation date − Processing date − Name of the work group − Publication date − Project number − Version number

ORGANISATION AND STRUCTURE − Naming files − Do not use − Special characters, blanks

ORGANISATION AND STRUCTURE − Naming files − Do not use − Special characters, blanks and punctuation marks − like { } [ ] < > ( ) * % # ´ ; ” , : ? ! & @ $ ~ − You could replace blanks with _ − You could capitalise words − For chronological sorting − Start with the date − E. g. YYYYMMDD_Name or YYYYMMDDName etc.

ORGANISATION AND STRUCTURE − Naming files − Avoid automatically generated names (e. g. by

ORGANISATION AND STRUCTURE − Naming files − Avoid automatically generated names (e. g. by devices) − Consider scalability − a file number with two digits restricts yourself to 99 files − Even in smaller projects it is worthwhile to write down the naming conventions − Explain abbreviations, for example in a DMP or a Readme file − This ensures reconstruction of these conventions even after years

ORGANISATION AND STRUCTURE − Version control − Mostly used: whole numbers for big changes

ORGANISATION AND STRUCTURE − Version control − Mostly used: whole numbers for big changes − Small changes are (numbers) connected with a underscore _ − E. g. v 1, v 2, v 1_01, v 2_03 etc. − File naming with version control (example pattern) [document name][version number]

DOCUMENTATION AND METADATA

DOCUMENTATION AND METADATA

DOCUMENTATION AND METADATA − A good documentation contains − A description of the research

DOCUMENTATION AND METADATA − A good documentation contains − A description of the research project − Aims − Hypotheses/assumptions − Detailed information on data creation (methods, units, periods, locations, used technology) − Arrangements for data revision − Structure of the data and their relations to each other − Descriptions for variables, labels, codes − Differences between versions − Information on access and licences 08/06/2021 Präsentationstitel/Autor/Veranstaltung 27

DOCUMENTATION AND METADATA − Metadata − Structured data that contains information about other data

DOCUMENTATION AND METADATA − Metadata − Structured data that contains information about other data − Can be saved with the data they describe as well as independently − Technical metadata − Metadata that describe content − Primarily they ensure that data can be found − Often in XML format to ensure machine-readability (picture: pixabay. com) 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 28

DOCUMENTATION AND METADATA − Metadata formats (example) − Dublin Core <head profile="http: //dublincore. org/documents/dcq-html/">

DOCUMENTATION AND METADATA − Metadata formats (example) − Dublin Core <head profile="http: //dublincore. org/documents/dcq-html/"> <title>Dublin Core</title> <link rel="schema. DC" href="http: //purl. org/dc/elements/1. 1/" /> <link rel="schema. DCTERMS" href="http: //purl. org/dc/terms/" /> <meta name="DC. format" scheme="DCTERMS. IMT" content="text/html" /> <meta name="DC. type" scheme="DCTERMS. DCMIType" content="Text" /> <meta name="DC. publisher" content="Jimmy Wales" /> <meta name="DC. subject" content="Dublin Core Metadaten-Elemente, Anwendungen" /> <meta name="DC. creator" content="Björn G. Kulms" /> <meta name="DCTERMS. license" scheme="DCTERMS. URI" content="http: //www. gnu. org/copyleft/fdl. html" /> <meta name="DCTERMS. rights. Holder" content="Wikimedia Foundation Inc. " /> <meta name="DCTERMS. modified" scheme="DCTERMS. W 3 CDTF" content="2006 -03 -08" /> </head> (source: https: //de. wikipedia. org/w/index. php? title=Dublin_Core&oldid=170536742) 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 29

DOCUMENTATION AND METADATA − Thesaurus, Authority files, and controlled vocabulary − Authority files −

DOCUMENTATION AND METADATA − Thesaurus, Authority files, and controlled vocabulary − Authority files − Allow an unambiguous assignment of individuals, institutions, funding organisations, locations, etc. − Examples: − ISNI - International Standard Name Identifier (http: //www. isni. org) − VIAF - Virtual International Authority File (https: //viaf. org/) − Open Funder Registry (http: //www. crossref. org/fundingdata/registry. html) (picture: pixabay. com) 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 30

DOCUMENTATION AND METADATA − Thesaurus, Authority files, and controlled vocabulary − Thesauri and controlled

DOCUMENTATION AND METADATA − Thesaurus, Authority files, and controlled vocabulary − Thesauri and controlled vocabularies − enhance metadata − Facilitate detectability of data − Examples for subject specific classifications − Environmental classification, https: //sns. uba. de/umthes/de/collections/UK. html − Physics and Astronomy Classification Scheme (PACS), https: //www. aip. org/pacs − Mathematics Subject Classification (MSC), https: //www. zbmath. org/classification 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung (picture: pixabay. com) 31

DOCUMENTATION AND METADATA − Thesaurus, Authority files, and controlled vocabulary − Examples for subject

DOCUMENTATION AND METADATA − Thesaurus, Authority files, and controlled vocabulary − Examples for subject specific thesauri − Agriculture: AGROVOC Multilingual agricultural thesaurus, http: //aims. fao. org/vestregistry/vocabularies/agrovoc-multilingualagricultural-thesaurus − Humanities: A Thesaurus of Old English, http: //oldenglishthesaurus. arts. gla. ac. uk/ − Medicine and Life Science: Thesaurus Medical Subject Headings (Me. SH), https: //www. nlm. nih. gov/mesh/ 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 32

STORAGE AND BACKUP

STORAGE AND BACKUP

STORAGE AND BACKUP − Research Data can be stored on different kinds of media

STORAGE AND BACKUP − Research Data can be stored on different kinds of media − Each with its own strengths and weaknesses − E. g. with respect to data loss or unauthorised access − A Backup is a copy of data on another medium − Implementation should be methodically and structured − To ensure easy data reconstruction

BACKUP (RULE OF THREE) − 3: Keep three copies of your data! − 2:

BACKUP (RULE OF THREE) − 3: Keep three copies of your data! − 2: Store them at two separate locations! − 1: Use more than one kind of storage hardware! − 0: Do not use USB-Sticks! USB-Sticks are not reliable storage media! (picture: pixabay. com)

ARCHIVING DATA

ARCHIVING DATA

ARCHIVE VS BACKUP − What is the difference between an archive and a backup?

ARCHIVE VS BACKUP − What is the difference between an archive and a backup? (picture: pixabay. com) 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 37

ARCHIVE VS BACKUP − Archive − Backup − Stores files that do not change

ARCHIVE VS BACKUP − Archive − Backup − Stores files that do not change anymore -> completed data − For operational recovery − To quickly recover files − Speed not as important − Accessability may take a few days − Focus on speed (backup & recovery) − Searchability rather critical − Focus on data integrity − Ability to scale data integrity and data retention over long periods of time important − long term data preservation 08/06/2021 Präsentationstitel/Autor/Veranstaltung 38

FILE FORMATS − Which file formats do you use in your work? − Do

FILE FORMATS − Which file formats do you use in your work? − Do you see any problems regarding their long-term usability? − Are there other formats you could convert your data to? − What features would be preferable for file formats in archives? (picture: pixabay. com) 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 39

RECOMMENDED FILE FORMATS File Format Recommendation Avoid Spreadsheets CSV, TSV, SPSS portable XLS Text

RECOMMENDED FILE FORMATS File Format Recommendation Avoid Spreadsheets CSV, TSV, SPSS portable XLS Text TXT, HTML, RTF, PDF/A DOC, PPT Multimedia Container: MPEG 4, Ogg Codec: Theora, Dirac, FLAC Quick Time H. 264 Images TIFF, JPEG 2000, PNG GIF, JPG − Why are some formats more suitable than others? (picture: pixabay. com) 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 40

RECOMMENDED FILE FORMATS − File formats for long term data access (Characteristics) − Non-proprietary

RECOMMENDED FILE FORMATS − File formats for long term data access (Characteristics) − Non-proprietary − Open, documented standard − Commonly used by research community − Standard representation (ASCII, Unicode) − Unencrypted − Uncompressed 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 41

ARCHIVING DATA − What should you consider when choosing a long term archiving service?

ARCHIVING DATA − What should you consider when choosing a long term archiving service? − Does it fulfil all necessary technical requirements? − Is it certified (Seal of Trust)? − What are the costs? − If necessary, could you make data publically available? − How would you estimate the persistence of the service? (picture: pixabay. com) 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 42

SECURITY

SECURITY

SECURITY − Why could data security be important? − How do you handle your

SECURITY − Why could data security be important? − How do you handle your data? − Is there anything you could or should improve? − How could you do that? (picture: pixabay. com) 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 44

ASPECTS OF SECURITY − Network Security − Confidential data should not be on the

ASPECTS OF SECURITY − Network Security − Confidential data should not be on the internet − Highly sensitive data should be on computers with no connection to the internet at all − Physical Security − Access to buildings and rooms where computers or media are kept should be restricted − Only trusted persons should be allowed access to computers, e. g. for troubleshooting 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 45

ASPECTS OF SECURITY − Computers and Files − Keep virus protection up to date

ASPECTS OF SECURITY − Computers and Files − Keep virus protection up to date − Do not send confidential data via E-Mail or FTP (at least use encryption) − Use strong passwords on files and computers 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 46

CHECKING PASSWORDS − Which password is the strongest? Rank them! (1) r 4 Z%tr.

CHECKING PASSWORDS − Which password is the strongest? Rank them! (1) r 4 Z%tr. L 7/§ (2) The_King‘s_3_Daughters (3) this is fun or else a nun or a pun − Password Meter − http: //cups. cmu. edu/meter/ 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 47

WHAT IS A STRONG PASSWORD? (Image by Randall Munroe: https: //xkcd. com/936/) 08. 06.

WHAT IS A STRONG PASSWORD? (Image by Randall Munroe: https: //xkcd. com/936/) 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 48

PASSWORDS − What is a strong password? − Length ≥ 16 characters (at the

PASSWORDS − What is a strong password? − Length ≥ 16 characters (at the moment sufficient, will be more in the future) − At least one special charcter, e. g. @#$% − At least one number, e. g. 123456 − At least one lowercase letter, e. g. abcdefgh − At least one uppercase letter, e. g. ABCDEFGH 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 49

PASSWORDS − How to create Strong Passwords − Diceware − Throw 5 d 6

PASSWORDS − How to create Strong Passwords − Diceware − Throw 5 d 6 and choose at least six words according to the resulting numbers from a list − Take a sentence that you can easily remember. Take the first character of each word. . − e. g. „The first house that I ever lived in was in 337 Some Street. The rent was 450€ per month. “ Passwort: „Tfht. Ieliwi 337 SS. Trw 4€pm. “. − 24 characters − Uppercase/lowercase − Numbers − Special characters (Abbildung: pixabay. com) 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 50

PASSWORDS − How to create Strong Passwords − Strengthen the xkcd password or Diceware

PASSWORDS − How to create Strong Passwords − Strengthen the xkcd password or Diceware Passwords − Add altertnating numbers, special characters between the words − E. g. ^ and 2 − Write every second (or fourth, or fifth, …) letter in uppercase − „correcthorsebatterystaple“ => „c. Orrect^h. Orse 2 b. Attery^s. Taple“ 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 51

PASSWORDS − How are passwords hacked? − Social Engineering / Social Hacking / Asking

PASSWORDS − How are passwords hacked? − Social Engineering / Social Hacking / Asking (!) − Interpersonal manipulation (e. g. via phone calls, e-mail) − Guessing – often passwords have a relation to the person using them − Brute-Force attacks − Testing all possible combinations − Common-word-attacks/Dictionary attacks − Use word lists or rainbow tables 08. 06. 2021 Präsentationstitel/Autor/Veranstaltung 52

Dr. Daniel T. Rudolf rudolf@ulb. uni-bonn. de

Dr. Daniel T. Rudolf rudolf@ulb. uni-bonn. de