Linkrot Content Drift Reference Rot and Legal Documents

  • Slides: 51
Download presentation
Linkrot, Content Drift, Reference Rot, and Legal Documents – How to Manage this Growing

Linkrot, Content Drift, Reference Rot, and Legal Documents – How to Manage this Growing Problem Laura Gordon-Murnane, MLS Bloomberg BNA, 20 June 2017

Wayback Machine, http: //web. archive. org/web/20140717155720/https: //vk. com/wall 57424472_7256, Accessed June 9, 2017

Wayback Machine, http: //web. archive. org/web/20140717155720/https: //vk. com/wall 57424472_7256, Accessed June 9, 2017

What is the average lifespan of webpage? Library of Congress, Signal’s Blog - November

What is the average lifespan of webpage? Library of Congress, Signal’s Blog - November 8, 2011 by Mike Ashenfelder (Guest Post by Nicholas Taylor) • In “Preserving the Internet, ” Scientific American November 1997, Brewster Kahle 44 days reported that the “estimates put the average lifetime for a URL at . ” http: //web. archive. org/web/19970504212157/http: /www. sciam. com/0397 issue/0397 kahle. html • Steve Lawrence, et al. 2001. “Persistence of Web References in Scientific Research. Computer 34, 2 (February 2001), 26 -31. DOI=http: //dx. doi. org/10. 1109/2. 901164 “Alexa Internet (http: //www. alexa. com/), which creates Internet navigation software and studies trends in Web content and behavior, estimates that Web pages disappear after an average time of only 75 days. ”

What is the average lifespan of webpage? • On the Web, Research Work Proves

What is the average lifespan of webpage? • On the Web, Research Work Proves Ephemeral Electronic Archivists Are Playing Catch-Up in Trying to Keep Documents From Landing in History's Dustbin, By Rick Weiss, Washington Post, Monday, November 24, 2003; Page A 08 https: //web. archive. org/web/20111112053459/http: //stevereads. com /cache/ephemeral_web_pages. html quoting Brewster Kahle, "The average lifespan of a Web page today is to run a culture. " 100 days. This is no way

The Internet is always in beta The Internet is Not Permanent Content on the

The Internet is always in beta The Internet is Not Permanent Content on the Internet is Not Fixed The Internet is Ephemeral and Unstable Content on the Internet decays rapidly and is lost

Digitization: Disruptive and Transformative • Benefits of Digitization • Accessibility to previously unavailable content

Digitization: Disruptive and Transformative • Benefits of Digitization • Accessibility to previously unavailable content • No need to travel to a library, archive, museum to access the source • Link will take you directly to the content • Distribution of scholarly material is unprecedented • Access to new types of content • Books, Datasets and Databases, Images, Articles, Recordings, Video and Movies

Digitization: Democratization of Content • Anyone can be a publisher • Can Make your

Digitization: Democratization of Content • Anyone can be a publisher • Can Make your Voice Heard • Many new types of tools, platforms, and content now available • Web Pages, Blogs, Email, Tweets • News • Datasets • Movies, Videos and Films • Music • Images, Charts, Graphics • Video Games • Easy to upload, edit, change, and remove content posted to the web

Information professionals, librarians, scholars, lawyers, judges, historians, scientists want and need data, content, information,

Information professionals, librarians, scholars, lawyers, judges, historians, scientists want and need data, content, information, facts, knowledge to be stable, accessible, and reliable. Disappearing content threatens the continuity of our collections and the ability to find, study, and learn from the past.

Questions • What happens when Internet content is cited in legal and government documents?

Questions • What happens when Internet content is cited in legal and government documents? • What happens when Internet content is cited or used in academic, scientific, and cultural studies? • What happens when Internet content is used for news, government policies, and the historical record?

Why is this a Problem?

Why is this a Problem?

Linkrot + Content Drift = Reference Rot: Definitions • Linkrot Definition: • “The resource

Linkrot + Content Drift = Reference Rot: Definitions • Linkrot Definition: • “The resource identified by a URI vanishes from the web. As a result, a URI reference to the resource ceases to provide access to referenced content. ” • Content Drift Definition: • “The resource identified by a URI changes over time. The resource’s content evolves and can change to such an extent that it ceases to be representative of the content that was originally referenced. ” Source: Jones SM, Van de Sompel H, Shankar H, Klein M, Tobin R, Grover C (2016) Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content. PLo. S ONE 11(12): e 0167475. doi: 10. 1371/journal. pone. 0167475

Reasons for Linkrot • Content is owned by a third party and will only

Reasons for Linkrot • Content is owned by a third party and will only be available as long as the 3 rd party agrees to maintain the link • Domain Name not Renewed/server shutdown • Government Agencies removing content from agency websites • Government/Organizations withdraw funding for projects and initiatives • Content moved to another server - No Redirects • Lack of interest in maintaining a link into the future • Fear of litigation • Copyright Issues • Incorrect links • Shift from “http” to “https” or the removal of “www” from the url

Reasons for Content Drift • Include new web content to support research, to add

Reasons for Content Drift • Include new web content to support research, to add background, and context to the scholarship • software, ontologies, scientific workflows, datasets, online debates, presentations, blogs, videos, news • Born digital content can easily be updated, substantially altered, or removed completely • Updated content can reflect a new position – completely different from the original publication.

Problems with Content Drift • Born digital content can be updated without notifying readers

Problems with Content Drift • Born digital content can be updated without notifying readers that changes have been made to this content • Domain address can lapse and a new owner can take over and completely change the content • Most Troublesome • Updated or changed content can be substantially different from the original position of the author.

Examples of Content Drift • Government data is removed because a new administration replaces

Examples of Content Drift • Government data is removed because a new administration replaces the previous administration’s links • Government’s decides to discontinue, remove, delete scientific research because it no longer deems those studies necessary or a change in philosophy

Consequences of Reference Rot: Linkrot + Content Drift • Moving scholarly communication from a

Consequences of Reference Rot: Linkrot + Content Drift • Moving scholarly communication from a paper-based system to a web-based system has consequences for the stability, authority, transparency, and reliability of the scholarship. • Changes the meaning, intent, and position of the original author • No longer able to determine what the original author meant and how you are using that in your own scholarship • No longer able to easily access the original content because it has been moved, edited, or deleted

Reference Rot and Legal Precedent • Legal Citations • Footnotes are the “cornerstone” of

Reference Rot and Legal Precedent • Legal Citations • Footnotes are the “cornerstone” of judicial opinions, law review articles, and academic scholarship • Citations/footnotes “provide both authorial verification of the original source material at the moment they are used and the needed information for readers to later find the cited source. ” Raizel Liebler and June Liebert Source: Liebler, Raizel and Liebert, June (2013) "SOMETHING ROTTEN IN THE STATE OF LEGAL CITATION: THE LIFE SPAN OF A UNITED STATES SUPREME COURT CITATION CONTAINING AN INTERNET LINK (1996 -2010), " Yale Journal of Law and Technology: Vol. 15 : Iss. 2 , Article 2. Available at: http: //digitalcommons. law. yale. edu/yjolt/vol 15/iss 2/2

“Is link rot destroying stare decisis as we know it? ” Minnesota’s former chief

“Is link rot destroying stare decisis as we know it? ” Minnesota’s former chief justice, Eric Magnuson…“It is scary, scary stuff. ” Doctrine of stare decisis - doctrine of precedent “Very difficult to create meaningful precedents based upon something impermanent. ” Liebler and Liebert Can courts “let the decision stand” if the cited authority is no longer available or accessible? ” (Arturo Torres)

Content Drift: An Example Justice Samuel Alito linked to this website (http: //ssnat. com)

Content Drift: An Example Justice Samuel Alito linked to this website (http: //ssnat. com) in his opinion on Brown v. Entertainment Merchants Association. http: //ssnat. com. http: //web. archive. o rg/web/2011052918 0722/http: //ssnat. co m: 80/

Reference Rot: An Example Justice Samuel Alito linked to this website (http: //ssnat. com)

Reference Rot: An Example Justice Samuel Alito linked to this website (http: //ssnat. com) in his opinion on Brown v. Entertainment Merchants Association.

Web Citations: Law Reviews and Court Opinions • Web Citations in law reviews and

Web Citations: Law Reviews and Court Opinions • Web Citations in law reviews and court opinions have increased dramatically since 1995 for all courts • Supreme Court • Federal Appellate • District Courts • State Courts • Internet Citations used for • Factual Information • Context • Clarification

Linkrot and Reference Rot Studies: Key Findings Raizel Liebler and June Liebert, Something Rotten

Linkrot and Reference Rot Studies: Key Findings Raizel Liebler and June Liebert, Something Rotten in the State of Legal Citation: The Life Span of a United States Supreme Court Citation Containing an Internet Link (1996 -2010) 15 YALE J. L. & TECH. 273 (2013) http: //yjolt. org/sites/default/files/Something_Rotten_in_Legal_Citation. pdf “Citations to the U. S. Supreme Court are especially important of the Court’s position at the top of federal court hierarchy, determining the law of the land, and even influencing the law in international jurisdictions. ”

Linkrot and Reference Rot Studies: Key Findings “The Supreme Court appears to have a

Linkrot and Reference Rot Studies: Key Findings “The Supreme Court appears to have a vast problem with link rot, the condition of internet links no longer working. We found that number of websites that are no longer working cited to by Supreme Court opinions is alarmingly high, almost one-third ( 29%). ”

Linkrot and Reference Rot Studies: Key Findings Jonathan Zittrain, Kendra Albert, and Lawrence Lessig,

Linkrot and Reference Rot Studies: Key Findings Jonathan Zittrain, Kendra Albert, and Lawrence Lessig, “Perma: Scoping and Addressing the Problems of Link and Reference Rot in Legal Citations, ” 127 Harv. L. Rev. F. 176 (2014) https: //harvardlawreview. org/2014/03/perma-scoping-andaddressing-the-problem-of-link-and-reference-rot-in-legalcitations/ “Building on previous studies of link rot, 3 we have reviewed links published within three legal journals — the Harvard Law Review (HLR), the Harvard Journal of Law and Technology (JOLT) and the Harvard Human Rights Journal (HRJ) — as well as the links contained across all published United States Supreme Court opinions.

Linkrot and Reference Rot Studies: Key Findings Jonathan Zittrain, Kendra Albert, and Lawrence Lessig,

Linkrot and Reference Rot Studies: Key Findings Jonathan Zittrain, Kendra Albert, and Lawrence Lessig, “Perma: Scoping and Addressing the Problems of Link and Reference Rot in Legal Citations, ” 127 Harv. L. Rev. F. 176 (2014) https: //harvardlawreview. org/2014/03/permascoping-and-addressing-the-problem-of-link-and-reference-rot-in-legalcitations/ “We documented a serious problem of reference rot: more than 70% of the URLs within the above mentioned journals, and 50% of the URLs within U. S. Supreme Court opinions suffer reference rot — meaning, again, that they do not produce the information originally cited. ”

Linkrot and Reference Rot Studies: Key Findings Hiberlink Project: Klein M, Van de Sompel

Linkrot and Reference Rot Studies: Key Findings Hiberlink Project: Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, Zhou K, et al. (2014) Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. PLo. S ONE 9(12): e 115253. https: //doi. org/10. 1371/journal. pone. 0115253 The Hiberlink project assembled a large corpus of data of more than 3. 5 million scholarly articles from three different sources: ar. Xiv, Elsevier and Pub. Med Central. A second corpus was built with 6, 400 e-theses downloaded from the repositories of five universities.

Linkrot and Reference Rot Studies: Key Findings Hiberlink Project: Klein and Van de Sompel,

Linkrot and Reference Rot Studies: Key Findings Hiberlink Project: Klein and Van de Sompel, “Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. ” “The vast majority of STM articles that contain references to web at large resources do suffer from reference rot. The infection rate between 2005 and 2012 oscillates between 70% and 80%. ”

Link Rot and Reference Rot: Key Findings Scholarly Context Adrift: Three out of Four

Link Rot and Reference Rot: Key Findings Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content 25 Jan 2017: PLOS ONE 12(1): e 0171057. https: //doi. org/10. 1371/journal. pone. 0171057 “Content Drift is a significant problem for Science, Technology and Math articles and scholarship” “Estimate that 75% of references the content has drifted away from what it was when referenced. ”

Factors that affect the stability of online information • Citations in recent publications are

Factors that affect the stability of online information • Citations in recent publications are more likely to have active links compared with web citations in older journal articles • Number of failing online citations increases over time • Links that go to subdomains within a website are more likely to fail rather than the home/landing page. • Avoid e-citations to very long urls.

Reference Rot: Solutions “Any solution to link and reference rot will have to address

Reference Rot: Solutions “Any solution to link and reference rot will have to address the impermanence of the Web, the havoc caused by organizational change (including webpage reorganization), handovers of domain names (and domain name sale), and successful citation practices. ” Zittrain, Albert, Lessig (2014).

Link Rot and Reference Rot: Court Solutions Judicial Conference of the United States -

Link Rot and Reference Rot: Court Solutions Judicial Conference of the United States - Internet Materials in Opinions: Citations and Hyperlinking (July 2009) Recommendations • “Clerks should download any cited Internet resources and include them with the opinions and • The downloaded Internet resources should be included as attachments on a non-fee basis in each court’s Case Management/Electronic Case Files System, such as PACER. ”

Link Rot and Reference Rot: Court Solutions Judicial Conference of the United States Internet

Link Rot and Reference Rot: Court Solutions Judicial Conference of the United States Internet Materials in Opinions: Citations and Hyperlinking (July 2009) Link to the Policy Internet Materials in Opinions: Citations and Hyperlinking, UNITED STATES COURTS: THE THIRD BRANCH (July 2009), http: //www. uscourts. gov/News/ The. Third. Branch/09 -07 - 01/Internet_Materials_in_Opinions_Citations_and_Hyperlinking. aspx

Link Rot and Reference Rot: Court Solutions

Link Rot and Reference Rot: Court Solutions

Link Rot and Reference Rot: Court Solutions https: //web. archive. org/web/20140416011248/http: //www. uscourts. gov:

Link Rot and Reference Rot: Court Solutions https: //web. archive. org/web/20140416011248/http: //www. uscourts. gov: 80/Ne ws/The. Third. Branch/09 -0701/Internet_Materials_in_Opinions_Citations_and_Hyperlinking. aspx

Link Rot and Reference Rot: Court Solutions United States Courts of Appeal for the

Link Rot and Reference Rot: Court Solutions United States Courts of Appeal for the Ninth Circuit (http: //www. ca 9. uscourts. gov/library/webcites/) The court created its own digital archive of websites cited within Ninth Circuit opinions dating to 2008. The “library saves a copy of the cited material as a PDF file and adds a watermark to denote the document’s archived status. ” Since January 1, 2016 – For pdfs of websites cited in Ninth Circuit opinions from January 1, 2016 going forward, see the case docket on PACER (https: //jenie. ao. dcn/ca 9 -ecf/)

Link Rot and Reference Rot: Court Solutions United States Courts for the Ninth Circuit

Link Rot and Reference Rot: Court Solutions United States Courts for the Ninth Circuit (http: //www. ca 9. uscourts. gov/library/webcites/) See the case docket on PACER (https: //jenie. ao. dcn/ca 9 -ecf/)

Link Rot and Reference Rot: Court Solutions US Supreme Court Clerk of Courts https:

Link Rot and Reference Rot: Court Solutions US Supreme Court Clerk of Courts https: //www. supremecourt. gov/opinions/Cited_URL_List. aspx • Retains a print copy of the cited Internet Materials with the Clerk of the court’s Case file • Includes the date in the copies of each cited Internet resource within the opinions • UC Berkeley School of Law Library • Hosting US Supreme Court Web Citations https: //scotus. law. berkeley. edu/

Link Rot and Reference Rot: Court Solutions Supreme Court of Canada launched an online

Link Rot and Reference Rot: Court Solutions Supreme Court of Canada launched an online archive of Internet Sources Cited in SCC Judgments (1998 – 2016). January 26, 2017 “The Office of the Registrar of the SCC, recognizing that web pages or websites that the Court cites in its judgments may subsequently vary in content or be discontinued, has located and archived the content of most online sources that had been cited by the Court between 1998 and 2016 in order to preserve access to them. These sources were captured with a content as close as possible to the original content. Links to the archived sources can be found here: Internet Sources Cited in SCC Judgments (1998 – 2016). ” “Since 2017, online sources cited in the “Authors Cited” section in SCC judgments have been captured and archived. When a judgment cites such a source, an “archived version” link is provided. ”

Link Rot and Reference Rot: Court Strategies • Digitally archive all materials cited within

Link Rot and Reference Rot: Court Strategies • Digitally archive all materials cited within the opinion, regardless of format • Make it easily retrievable • Make it free of charge • Create own archive of cited content by uploading the material to its website • Partner with organizations who can provide technical assistance • Internet Archive • Perma. cc

Link Rot and Reference Rot: Archival Strategies Internet Archive • Internet Archive (Free) https:

Link Rot and Reference Rot: Archival Strategies Internet Archive • Internet Archive (Free) https: //archive. org/ • Use the Save Page Now to save content • Chrome Extension

Link Rot and Reference Rot: Archival Strategies Fee–Based Tools • • Archive-IT (Fee-Based) (https:

Link Rot and Reference Rot: Archival Strategies Fee–Based Tools • • Archive-IT (Fee-Based) (https: //archive-it. org) Subscription Web Archiving Service Deployed in 2006 400+ Partner Organizations in 16 countries and 48 states • College and University Libraries • State Archives, Libraries, and Historical Societies • Federal Institutions and NGOs • Museums and Art Libraries • Public Libraries, Cities and Counties

Link Rot and Reference Rot: Archival Strategies Perma. cc Harvard Library Innovation Lab Chrome

Link Rot and Reference Rot: Archival Strategies Perma. cc Harvard Library Innovation Lab Chrome Extension

Link Rot and Reference Rot: Archival Strategies Perma. cc archives the referenced content and

Link Rot and Reference Rot: Archival Strategies Perma. cc archives the referenced content and generates a link to an archived record of the page. Perma. cc promises to “create citation links that will never break” Designed to Prevent: Reference Rot: Link Rot and Content Drift Built-in Redundancies, no single point of failure

Link Rot and Reference Rot: Archival Strategies Perma. CC (https: //perma. cc/) Free to

Link Rot and Reference Rot: Archival Strategies Perma. CC (https: //perma. cc/) Free to public and academic users • Initially Limited to Law Libraries – Goal to extend it Nation’s Academic Libraries (free) Develop a commercial model • Grant from the Institute of Museum and Library Services to allow the expansion of the Perma. cc web archiving service Role of Libraries: Libraries serve as Perma. cc “Registrars” • Consortium of Libraries https: //perma. cc/about/#perma-partners Easy Integration in Web Browsers • Perma. cc - Browser Extensions in both Chrome and Fire. Fox https: //perma. cc/settings/tools

Link Rot and Reference Rot: Archival Strategies Law Library of Congress On October 1,

Link Rot and Reference Rot: Archival Strategies Law Library of Congress On October 1, 2015 – Law Library of Congress officially implemented the use of Perma. cc – so going forward links in Law Library of Congress reports will contain links to archived versions of referenced web pages. Regulation of Drones http: //www. loc. gov/law/help/regulation-of-drones. pdf

Link Rot and Reference Rot: Archival Strategies Hiberlink: Memento Protocol and the Time Travel

Link Rot and Reference Rot: Archival Strategies Hiberlink: Memento Protocol and the Time Travel Web Portal Los Alamos National Laboratory Research Library and the University of Edinburgh (EDINA and the Language Technology Group of the School of Informatics) Navigating backward in time to find the pages closest in time relevant to your information inquiry Searches the following Archives archive. today, Archive-It, Arquivo. pt: the Portuguese Web Archive, Bibliotheca Alexandrina Web Archive, DBpedia archive, DBpedia Triple Pattern Fragments archive, Canadian Government Web Archive, Croatian Web Archive, Estonian Web Archive, Icelandic web archive, Internet Archive, Library of Congress Web Archive, NARA Web Archive, National Library of Ireland Web Archive, perma. cc, PRONI Web Archive, Slovenian Web Archive, Stanford Web Archive, UK Government Web Archive, UK Parliament's Web Archive, UK Web Archive, Web Archive Singapore, Web. Cite, Bayerische Staatsbibliothek

Link Rot and Reference Rot: Archival Strategies Memento Time Travel Search Engine http: //timetravel.

Link Rot and Reference Rot: Archival Strategies Memento Time Travel Search Engine http: //timetravel. mementoweb. org/ EPA January 19, 2017 http: //timetravel. mementoweb. org/list/2 0170119223826/http: //www. epa. gov Chrome Extension (Make it easy and seamless)

Link Rot and Reference Rot: Archival Strategies Hiberlink Project Recommend the following Solutions: •

Link Rot and Reference Rot: Archival Strategies Hiberlink Project Recommend the following Solutions: • Pro-actively create snapshots of web content by using a web archiving tool – Perma. cc, Internet Archive, or Archive-IT • Use archived links along with the original link when citing links in all scholarship – this will ensure the availability regardless of what happens to the original link to the content.

Link Rot and Reference Rot: Roles for Librarians Important and vital Role for Librarians

Link Rot and Reference Rot: Roles for Librarians Important and vital Role for Librarians to be champions and advocates for the preservation of web content for future generations • Real Opportunity for Librarians and Libraries • Educate colleagues, faculty, students, patrons on the problems of • Link Rot • Content Drift • Reference Rot • Educate Management, Colleagues, Faculty, Students, patrons on the current solutions to these problems. • Out Reach is critical to solve these problems • Provide training on how to use the tools • Work with Management to ensure that these tools are in place

Contact Laura Gordon-Murnane Bloomberg BNA Phone: 703. 341. 3309 Email: lgordonm@bna. com

Contact Laura Gordon-Murnane Bloomberg BNA Phone: 703. 341. 3309 Email: lgordonm@bna. com