Digital Curation Mc Gill Jenn Riley Associate Dean

  • Slides: 27
Download presentation
Digital Curation @ Mc. Gill Jenn Riley Associate Dean, Digital Initiatives Mc. Gill University

Digital Curation @ Mc. Gill Jenn Riley Associate Dean, Digital Initiatives Mc. Gill University Library

What are we responsible for curating?

What are we responsible for curating?

Digitized content § Primarily digitized versions of analogue rare/unique/archival/valuable materials § Close collaborations with

Digitized content § Primarily digitized versions of analogue rare/unique/archival/valuable materials § Close collaborations with Rare Books & Special Collections, and Mc. Gill University Archives § Currently working on collection prioritization and scaling up production

Born digital university records of long term value § Institutional mandate § Determined by

Born digital university records of long term value § Institutional mandate § Determined by records retention schedule § Involves transfer of selected records from originating departments after the end of their immediate life § Archival appraisal practices determine what to keep

Born digital archival materials § Fonds of personal papers and organizational records § The

Born digital archival materials § Fonds of personal papers and organizational records § The same types of documents we’ve already collected in paper form § Examples of paper archival fonds from the past: § § Montréal Natural History Society James Mc. Gill MS 435 George Mercer Dawson (President of Royal Society of Canada) Harvey Cushing Fonds (William Osler biographer)

Born digital creative content § E. g. , digital art and digital humanities projects

Born digital creative content § E. g. , digital art and digital humanities projects § Often more like software than a set of standalone files § High risk of loss compared to analogue ancestors

Some licensed/purchased digital content § Typically we only have remote access, are not responsible

Some licensed/purchased digital content § Typically we only have remote access, are not responsible directly for curation § In some cases we must deliver ourselves rather than rely on the vendor § And in these cases, we take on curation responsibility

ETDs and other student work § Mc. Gill students required to deposit masters and

ETDs and other student work § Mc. Gill students required to deposit masters and doctoral theses, sign a non-exclusive license to disseminate § Policy allows students to request a 1 -year embargo § Students retain copyright § Mc. Gill does not contract with Pro. Quest for ETD delivery and preservation § Mc. Gill participates in Theses Canada § Some courses show interest in pushing student work to e. Scholarship@Mc. Gill

Pre-prints/post-prints § Supporting “green OA” § In fulfillment of funder mandates § Or voluntarily

Pre-prints/post-prints § Supporting “green OA” § In fulfillment of funder mandates § Or voluntarily § Still not heavily used, at Mc. Gill or elsewhere § No serious discussion yet at Mc. Gill about a campus mandate § Expecting Canadian Tri-Council OA mandate beginning May 1, 2015

Research data § BIG new focus § Studies show significant loss of data sets

Research data § BIG new focus § Studies show significant loss of data sets over time § Odds of data supporting a paper being extant fall by 17% per year (Vines et al 2014; doi: 10. 1016/j. cub. 2013. 11. 014) § Some studies show a citation advantage for papers with open data § 30% for papers published in 2004 and 2005 (Piwowar and Vision, 2013; doi: 10. 7717/peerj. 175) § Expecting Canadian Tri-Council data management planning requirements in 2015/2016

How do we curate it?

How do we curate it?

Find/Coll ect It … …

Find/Coll ect It … …

This is difficult! § As difficult as any other step § Luckily, it’s not

This is difficult! § As difficult as any other step § Luckily, it’s not an all or nothing proposition § Some areas we’re pretty good at (ETDs, digitized collections) § Others we try but with limited success (pre-prints/post-prints) § Others are brand new to us (research data, born digital archival materials, born digital creative content)

Processing and organizing § Determine what’s worth keeping § Create/map metadata § Responsibility to

Processing and organizing § Determine what’s worth keeping § Create/map metadata § Responsibility to handle personally identifiable information carefully

Find/Collect It Put It Somewhere Safe …

Find/Collect It Put It Somewhere Safe …

Several different approaches in place now § Digitization master files to NCS for storage

Several different approaches in place now § Digitization master files to NCS for storage § Backups of files/servers (digital collections, e. Scholarship@Mc. Gill, born digital university records) § Multiple copies including one off site § e. Scholarship is a “repository” but not a “preservation repository” § Reliance on external vendors (licensed content) § E. g. , through LOCKSS § We run a LOCKSS node at Mc. Gill § And the stuff we’re not handling so well (born digital special collections/archival materials)

What about access?

What about access?

How do we make this better? § Need better repositories § That handle common

How do we make this better? § Need better repositories § That handle common use cases § Hierarchical file structures § Paged objects § Display common file types in-browser § That are connected to preservation systems and manage content in them

Find/Collect It Put It Somewhere Safe Keep It Safe Over Time

Find/Collect It Put It Somewhere Safe Keep It Safe Over Time

Yeah, this is hard § Harder than the paper world! § What is “the

Yeah, this is hard § Harder than the paper world! § What is “the long term”? § How long will Universities exist in their current form? § How long will computers continue to function the way they do now? § How will metadata structures evolve over this period of time? § What does a “pay once” model for digital preservation look like? § What criteria do we use to determine the useful lifespan of a digital file? § It’s about policy as much as technology § How do our organizations set things up to ensure someone takes an active management role over time

Strategies § Standardize input file formats to the degree possible § Actively check file

Strategies § Standardize input file formats to the degree possible § Actively check file integrity § Refresh hardware frequently § Know what will need to be emulated, and what you can safely migrate § Partner!

Who’s doing this well? § Chronopolis @ UC San Diego § CLOCKSS § Portico

Who’s doing this well? § Chronopolis @ UC San Diego § CLOCKSS § Portico § Héritage (Canadiana from CRKN) § Scholars’ Portal § Hathi. Trust § APTrust § DPN

That’s a lot of groups! § And they all need funding to run §

That’s a lot of groups! § And they all need funding to run § And our organizations pay the membership fees from our institutional budgets § How do we get them all to work together? § Committee on Coherence at Scale for Higher Education

Biggest decision points

Biggest decision points

Technical § How many repositories and how they connect § Open source vs locally

Technical § How many repositories and how they connect § Open source vs locally hosted vended vs cloud § Metadata issues § Keeping up with technological advancements

Policy § How many copies § What preservation actions are necessary § Who to

Policy § How many copies § What preservation actions are necessary § Who to partner with § In Canada or beyond? § Business planning

Thank you! § jenn. riley@mcgill. ca § These presentation slides: http: //www. jennriley. com/presentations/mcgillsis/15

Thank you! § jenn. riley@mcgill. ca § These presentation slides: http: //www. jennriley. com/presentations/mcgillsis/15 winter/cur ation. ppt