Documenting and organising your data For an easier
- Slides: 28
Documenting and organising your data For an easier life lib. uts. edu. au utslibrary
Over the next 60 ish mins: • • • Why this stuff matters Metadata Tagging and file hierarchies File naming and renaming Version control
Documenting your data
So what might this be?
Why document? • Enables you to understand/interpret data • Tells the story of where the data came from • Ensures informed and correct use, reduces chance of incorrect use/misinterpretation
What to document? • • • Wider contextual information Data collection methodology and processes Information on dataset structure Variable-level documentation Data confidentiality, access and use conditions
Bad vs Good http: //figshare. com/articles/Excel_database_of_th e_Ph. D_thesis/1360019 http: //figshare. com/articles/Main_Dataset_for_Evo lution_of_Popular_Music_USA_1960_2010_/1309 953
Let’s get organised
Why? • • You think you’ll remember things, but over time… Multitude of formats and version of data and documentation Investment of time at the beginning can save time in the long run Good file management practices/naming protocols enable sharing with collaborators
Can you relate? Experimentdata. txt Report. Draft. doc Laurensdata. dat Report. Final. doc Data: currentversion. dat Report. Finalv 2 Last. One. doc Todaysimage. tif Report. Final. doc
Some filing principles • • There’s no single right way to do it Establish and document a system that works for you Strike the balance between doing too much and too little: be realistic The 5 Cs: be Clear, Concise, Consistent, Correct, and Conformant
Hierarchical or Tag-based Hierarchical – Items are organised in folders and sub-folders Tag-based – Each item assigned one or more tags Often used in combination
Hierarchical filing The good • Familiar and widely used • Good at representing the structure of information – constructing the hierarchy can itself be a helpful exercise • Similar items are stored together • Sub-folders can function as task lists The not so good • Surprisingly hard work to set up and maintain – ‘a heavyweight cognitive activity’ • Can be hard to get the right balance between breadth and depth • Items can only go in one place • Time consuming to re-organise if the hierarchy becomes out of date
Sample folder hierarchy from the UK data archive
Tag-based filing The good The not so good • Items can go in more than • Not how operating systems store one category – and multiple files types of category can be • If material isn’t tagged properly at used first it can be hard to find later • Many people find tagging • Inconsistent tagging is common quicker and easier than • Similarly named categories can hierarchical filing get mixed • Can be easier to combine • Less good at representing the than hierarchical systems structure of information when collaborating • You can search for tags in Finder and Windows explorer
Lets do Metadata Open a Word doc and choose file>information
File naming • Important for future access and retrieval • Provides contextual information • Creates logical structure for skimming through many files and versions
How could these file names be improved?
Best practice for File Naming • Keep file names short but meaningful • Define the types of data and file formats for the research • Avoid using generic file names – ie: draft, final version etc. • Use underscores to differentiate between words (avoid spaces) • Avoid special characters such as: & * % $ £ ] { ! @ / as these are often used for specific tasks in a digital environment • Consider scalability • Not all systems/software case-sensitive and recognize capitals; so assume that TANGO, Tango and tango are the same • Don’t rely on file names as your sole source of documentation
Possible elements • Project/grant name and/or number • Date of creation: useful for version control, e. g. , YYYYMMDD • Name of creator/investigator: last name first followed by (initials of) first name • Description of content/subject descriptor • Data collection method (instrument, site, etc. ) • Version number
Example of good file naming • FG 1_CONS_12 Feb 10 is the file that contains the transcript of the first focus group with a study of consumers, that took place on 12 February 2010 • Int 024_AP_5 June 08 is an interview with participant 024, interviewed by Anne Parsons on 5 June 2008
Naming and renaming • Check to see if your instrument, software, or other equipment that outputs your data files can be set with a file naming system • Less work than retrospectively changing filenames • Batch renaming tools available
Version control • Create a version control table or file history • Document your convention and be consistent • Record every change • Put old versions in separate folder • Consider discarding or deleting obsolete versions (while retaining the original 'raw' copy) if appropriate
Version control cont. • In the file/folder names, use ordinal numbers (1, 2, 3, etc. ) for major changes and the decimal for minor changes e. g v 1, v 1. 1, v 2. 6 • Beware of imprecise labels: revision, final 2, definitive_copy - they may not be as definitive as you thought
Version Control Doc
Version Control Final • Some software has built in version control facilities, e. g. : Ø control rights to file editing: read/write permissions (Windows Explorer) Ø versioning or tracking features in collaborative documents (Wikis, intranets, Google. Docs) • Consider using version control software: • Guidance from MIT Libraries on software options: http: //libraries. mit. edu/datamanagement/files/2014/05/version-control-handout. pdf
But how will I remember all this stuff? You can use this form to plot out the structure of your own data Establishes good practice early by helping form working habits. Print out and stick on the wall above your desk!
Questions? David Litting david. litting@uts. edu. au Many thanks to MIT Libraries for making the excellent materials this workshop is based on available for reuse http: //libraries. mit. edu/data-management/files/2014/05/file-organization-july 2014. pdf lib. uts. edu. au utslibrary This work is licensed under a Creative Commons Attribution 4. 0 International License.
- Iso 26000 7 core subjects
- Project scheduling
- Planning and organising skills in the workplace
- Documenting use cases
- Pie nursing examples
- Week by week plans for documenting children's development
- Documenting maine jewry
- Documenting java code
- Difference between accountability and responsibility
- Lambini and sons
- Directing coordinating staffing
- Self organising map
- Planning, organizing, directing and controlling are the
- Organizing seminars and conferences
- Why are gases easier to compress than solids or liquids?
- Screw machine example
- A device that makes doing work easier
- Simple machine
- Types of pulley
- How does a wedge make work easier
- Geography of sw asia color by number
- I can easier teach twenty
- A device that make work easier
- Effort arm and resistance arm
- An easier way to choose contrasting structures is to pick
- Why its easier than ever
- Abjad cipher
- Push
- Remaining