Documenting and organising your data For an easier

  • Slides: 28
Download presentation
Documenting and organising your data For an easier life lib. uts. edu. au utslibrary

Documenting and organising your data For an easier life lib. uts. edu. au utslibrary

Over the next 60 ish mins: • • • Why this stuff matters Metadata

Over the next 60 ish mins: • • • Why this stuff matters Metadata Tagging and file hierarchies File naming and renaming Version control

Documenting your data

Documenting your data

So what might this be?

So what might this be?

Why document? • Enables you to understand/interpret data • Tells the story of where

Why document? • Enables you to understand/interpret data • Tells the story of where the data came from • Ensures informed and correct use, reduces chance of incorrect use/misinterpretation

What to document? • • • Wider contextual information Data collection methodology and processes

What to document? • • • Wider contextual information Data collection methodology and processes Information on dataset structure Variable-level documentation Data confidentiality, access and use conditions

Bad vs Good http: //figshare. com/articles/Excel_database_of_th e_Ph. D_thesis/1360019 http: //figshare. com/articles/Main_Dataset_for_Evo lution_of_Popular_Music_USA_1960_2010_/1309 953

Bad vs Good http: //figshare. com/articles/Excel_database_of_th e_Ph. D_thesis/1360019 http: //figshare. com/articles/Main_Dataset_for_Evo lution_of_Popular_Music_USA_1960_2010_/1309 953

Let’s get organised

Let’s get organised

Why? • • You think you’ll remember things, but over time… Multitude of formats

Why? • • You think you’ll remember things, but over time… Multitude of formats and version of data and documentation Investment of time at the beginning can save time in the long run Good file management practices/naming protocols enable sharing with collaborators

Can you relate? Experimentdata. txt Report. Draft. doc Laurensdata. dat Report. Final. doc Data:

Can you relate? Experimentdata. txt Report. Draft. doc Laurensdata. dat Report. Final. doc Data: currentversion. dat Report. Finalv 2 Last. One. doc Todaysimage. tif Report. Final. doc

Some filing principles • • There’s no single right way to do it Establish

Some filing principles • • There’s no single right way to do it Establish and document a system that works for you Strike the balance between doing too much and too little: be realistic The 5 Cs: be Clear, Concise, Consistent, Correct, and Conformant

Hierarchical or Tag-based Hierarchical – Items are organised in folders and sub-folders Tag-based –

Hierarchical or Tag-based Hierarchical – Items are organised in folders and sub-folders Tag-based – Each item assigned one or more tags Often used in combination

Hierarchical filing The good • Familiar and widely used • Good at representing the

Hierarchical filing The good • Familiar and widely used • Good at representing the structure of information – constructing the hierarchy can itself be a helpful exercise • Similar items are stored together • Sub-folders can function as task lists The not so good • Surprisingly hard work to set up and maintain – ‘a heavyweight cognitive activity’ • Can be hard to get the right balance between breadth and depth • Items can only go in one place • Time consuming to re-organise if the hierarchy becomes out of date

Sample folder hierarchy from the UK data archive

Sample folder hierarchy from the UK data archive

Tag-based filing The good The not so good • Items can go in more

Tag-based filing The good The not so good • Items can go in more than • Not how operating systems store one category – and multiple files types of category can be • If material isn’t tagged properly at used first it can be hard to find later • Many people find tagging • Inconsistent tagging is common quicker and easier than • Similarly named categories can hierarchical filing get mixed • Can be easier to combine • Less good at representing the than hierarchical systems structure of information when collaborating • You can search for tags in Finder and Windows explorer

Lets do Metadata Open a Word doc and choose file>information

Lets do Metadata Open a Word doc and choose file>information

File naming • Important for future access and retrieval • Provides contextual information •

File naming • Important for future access and retrieval • Provides contextual information • Creates logical structure for skimming through many files and versions

How could these file names be improved?

How could these file names be improved?

Best practice for File Naming • Keep file names short but meaningful • Define

Best practice for File Naming • Keep file names short but meaningful • Define the types of data and file formats for the research • Avoid using generic file names – ie: draft, final version etc. • Use underscores to differentiate between words (avoid spaces) • Avoid special characters such as: & * % $ £ ] { ! @ / as these are often used for specific tasks in a digital environment • Consider scalability • Not all systems/software case-sensitive and recognize capitals; so assume that TANGO, Tango and tango are the same • Don’t rely on file names as your sole source of documentation

Possible elements • Project/grant name and/or number • Date of creation: useful for version

Possible elements • Project/grant name and/or number • Date of creation: useful for version control, e. g. , YYYYMMDD • Name of creator/investigator: last name first followed by (initials of) first name • Description of content/subject descriptor • Data collection method (instrument, site, etc. ) • Version number

Example of good file naming • FG 1_CONS_12 Feb 10 is the file that

Example of good file naming • FG 1_CONS_12 Feb 10 is the file that contains the transcript of the first focus group with a study of consumers, that took place on 12 February 2010 • Int 024_AP_5 June 08 is an interview with participant 024, interviewed by Anne Parsons on 5 June 2008

Naming and renaming • Check to see if your instrument, software, or other equipment

Naming and renaming • Check to see if your instrument, software, or other equipment that outputs your data files can be set with a file naming system • Less work than retrospectively changing filenames • Batch renaming tools available

Version control • Create a version control table or file history • Document your

Version control • Create a version control table or file history • Document your convention and be consistent • Record every change • Put old versions in separate folder • Consider discarding or deleting obsolete versions (while retaining the original 'raw' copy) if appropriate

Version control cont. • In the file/folder names, use ordinal numbers (1, 2, 3,

Version control cont. • In the file/folder names, use ordinal numbers (1, 2, 3, etc. ) for major changes and the decimal for minor changes e. g v 1, v 1. 1, v 2. 6 • Beware of imprecise labels: revision, final 2, definitive_copy - they may not be as definitive as you thought

Version Control Doc

Version Control Doc

Version Control Final • Some software has built in version control facilities, e. g.

Version Control Final • Some software has built in version control facilities, e. g. : Ø control rights to file editing: read/write permissions (Windows Explorer) Ø versioning or tracking features in collaborative documents (Wikis, intranets, Google. Docs) • Consider using version control software: • Guidance from MIT Libraries on software options: http: //libraries. mit. edu/datamanagement/files/2014/05/version-control-handout. pdf

But how will I remember all this stuff? You can use this form to

But how will I remember all this stuff? You can use this form to plot out the structure of your own data Establishes good practice early by helping form working habits. Print out and stick on the wall above your desk!

Questions? David Litting david. litting@uts. edu. au Many thanks to MIT Libraries for making

Questions? David Litting david. litting@uts. edu. au Many thanks to MIT Libraries for making the excellent materials this workshop is based on available for reuse http: //libraries. mit. edu/data-management/files/2014/05/file-organization-july 2014. pdf lib. uts. edu. au utslibrary This work is licensed under a Creative Commons Attribution 4. 0 International License.