THE FRONT MATTERS Capturing Journal Front Matter Content

  • Slides: 53
Download presentation
THE FRONT MATTERS: { Capturing Journal Front Matter Content with JATS

THE FRONT MATTERS: { Capturing Journal Front Matter Content with JATS

Front Matter vs. Journal Matter (disambiguation) For the purposes of this presentation: “front matter”

Front Matter vs. Journal Matter (disambiguation) For the purposes of this presentation: “front matter” = “journal matter” In the current publishing environment where more and more journals are published online, there are many examples of journals without a traditional “front”.

Obvious

Obvious

This… not as much

This… not as much

Rachael Carter a journal manager at PMC at the National Center of Biotechnology Information

Rachael Carter a journal manager at PMC at the National Center of Biotechnology Information at the US National Library of Medicine. Rachael graduated in 2010 from the University of Maryland with a Masters of Library Science. Kathryn Funk a technical editor for NIHMS and Pub. Med Health at the National Center of Biotechnology Information at the US National Library of Medicine. Kathryn graduated from The Catholic University of America with a Masters of Library and Information Science. Rebecca Mooney formerly a journal manager at PMC at the National Center of Biotechnology Information at the US National Library of Medicine, recently moved to a new position as a Project Analyst in the IT Department of the American Association for the Advancement of Science (AAAS). Rebecca graduated in 2008 from the University of Maryland with a Masters of Library Science. Team Introduction

“Decisions must be made about what will actually be saved for future use… Will

“Decisions must be made about what will actually be saved for future use… Will the content consist only of articles in a journal, or will it also include front matter (such as the names of the members of the journal’s editorial board)? ” Marcum, 2001 The Big Picture

PMC as an archive has a responsibility to answer: What we should preserve? How

PMC as an archive has a responsibility to answer: What we should preserve? How we should preserve? Why preserve? NLM Initiative

PMC Submission Method A

PMC Submission Method A

 • • Currently, PMC strives to archive data at the article level, but

• • Currently, PMC strives to archive data at the article level, but sees the potential benefit in finding a way to preserve information about the journal that the articles were published in, such as who was Editor in Chief at the time of publication? What was the journal’s philosophy at this time? Etc. TOCs: PMC creates their one table of contents, organized by article-type. Still very article based, not at the issue level. PMC structure

Front Matter “capturing” in PMC as it currently exists – through banner journal-links only

Front Matter “capturing” in PMC as it currently exists – through banner journal-links only

 What PMC Front Matter IS Editorial board Journal philosophy Submission guidelines Subscription information

What PMC Front Matter IS Editorial board Journal philosophy Submission guidelines Subscription information Covers Journal contact information Publisher information What PMC Front Matter is NOT Tables of contents Advertisements Forewords Prefaces Scope of Front Matter within project

Frontmatter DTD development Timeline 2001 NLM DTD developed issueadmin. dtd was made available 2011

Frontmatter DTD development Timeline 2001 NLM DTD developed issueadmin. dtd was made available 2011 Atypon Issue XML presented at JATS-Con 2012 pmcjournalmatter. dtd developed

Limitations of PDF - Assumes there is an issue to scan - Difficult to

Limitations of PDF - Assumes there is an issue to scan - Difficult to update content - Limited to certain platforms and technologies XML to the rescue - The content is queryable and reusable - Updating just requires editing a file - Allows for data manipulation over various platforms/formats Value of capturing front matter as XML

o Mostly because we already use JATS o It’s flexible o o Already had

o Mostly because we already use JATS o It’s flexible o o Already had meaningful framework to capture journal article content Works well within the structure of PMC • consistency Why we chose to create an extension to JATS

Why JATS isn’t enough to capture front matter: No meaningful way to capture front

Why JATS isn’t enough to capture front matter: No meaningful way to capture front matter elements such as editorial boards No way to tag journal metadata at a level higher than article-meta Limitations of JATS

 To capture front matter in the environment in which it was published To

To capture front matter in the environment in which it was published To work as much as possible with the existing JATS framework To create a DTD that would allow for flexibility in both use in rendering Goals

Tagged samples of front matter using our DTD and made adjustments Looking at samples

Tagged samples of front matter using our DTD and made adjustments Looking at samples Defined content types Completed first iteration of the pmcjournalmatter. dtd Created new elements Testing 1 2 3 Adjustments made to final DTD based on user feedback User testing: PMC journal managers

Highlighted physical example of a journal’s front matter

Highlighted physical example of a journal’s front matter

Anything in RED is required <journal-meta> contains, in order: • <journal-id>* • <journal-title-group> •

Anything in RED is required <journal-meta> contains, in order: • <journal-id>* • <journal-title-group> • <issn>* • <isbn>* • <publisher>? <issue-meta> contains, in order: • <pub-date>* • <volume>? • <issue-title>* • <issue-sponsor>* • <first-page><last-page>? <page-range>? OR <elocation-id>? <document-meta> contains, in order: • <pub-date>* • <document-title> • <self-uri>* <body> contains, in order: • <person-list> requires one or more <person> • <person> contains, in order: • <name> OR <string-name> OR <collab> • <degrees>* • <address>* • <aff>* • <role>* • <ext-link>* • <xref>* Initial Classification

> t s i l n o s <per <issue-meta> > a t e

> t s i l n o s <per <issue-meta> > a t e m - t n e m u c o d < Created new elements

Tagged samples of front matter using our DTD and made adjustments

Tagged samples of front matter using our DTD and made adjustments

User testing: PMC journal managers

User testing: PMC journal managers

. mod . ent pmcjournalmatter. dtd pmcjournal matter custom. ent customizations DTD technical details

. mod . ent pmcjournalmatter. dtd pmcjournal matter custom. ent customizations DTD technical details

<journalmatter-type="issue" content -type="edboard"> Root element: journalmatter

<journalmatter-type="issue" content -type="edboard"> Root element: journalmatter

 How to generate a foundation for organizing and labeling the front matter content?

How to generate a foundation for organizing and labeling the front matter content? Answering the question of can we tag all of this content in one document? Challenges

Root element attribute: @journalmatter-type

Root element attribute: @journalmatter-type

 Prevents hybrid of issue and non-issue content in the same document Changes in

Prevents hybrid of issue and non-issue content in the same document Changes in content can be more easily updated Allows a single journal to have issue and standing documents Issue vs. Standing: The Benefits

issue - Cover standing – Information of Authors Example: Standing & Issue

issue - Cover standing – Information of Authors Example: Standing & Issue

 @content-type Separate documents Flexibility In tagging and rendering Update as need be EX:

@content-type Separate documents Flexibility In tagging and rendering Update as need be EX: Journal philosophy vs. ed board Root element: @content-type

edboard other cover @contenttype info-forauthors generalinfo publisher Individual documents for each @content-type.

edboard other cover @contenttype info-forauthors generalinfo publisher Individual documents for each @content-type.

 Cover ("cover"): can include cover image, caption, and cover image copyright information. Editorial

Cover ("cover"): can include cover image, caption, and cover image copyright information. Editorial Board ("edboard"): can include executive editors, associate editors, etc. as well as general editorial board members. General Journal Information ("general-info"): can include but is not limited to journal mission statement, scope, journal contact information, subscription information, copyright, and other journal-specific content. Publisher Information ("publisher"): can include publisher philosophy, other journals published, contact information, etc. Information for Authors ("info-for-authors"): can include article submission and formatting instructions. Other ("other"): if the document is not one of the listed types or the type of document cannot be determined, the "other" attribute value may be used. @content-type values

The 4 Main elements of a document <doc u > a t e m

The 4 Main elements of a document <doc u > a t e m e <issu eta> -l m a n r u o j < ment -meta > y d bo < <journalmatter> >

<!ENTITY % journal-meta-model "(journal-id*, journal-title-group*, issn*, isbn*, publisher*)"> <journal-meta>

<!ENTITY % journal-meta-model "(journal-id*, journal-title-group*, issn*, isbn*, publisher*)"> <journal-meta>

<!ENTITY % issue-meta-model "(pub-date*, volume? , issueid*, issue-title*, issue-sponsor*)"> <issue-meta>

<!ENTITY % issue-meta-model "(pub-date*, volume? , issueid*, issue-title*, issue-sponsor*)"> <issue-meta>

<!ENTITY % document-meta-model "((document-title, document-subtitle? )? , contrib-group? , pub-date*, (((fpage, lpage? , page-range?

<!ENTITY % document-meta-model "((document-title, document-subtitle? )? , contrib-group? , pub-date*, (((fpage, lpage? , page-range? ) | elocation-id)? ), self -uri*, permissions? )" <document-meta>

Borrowed directory from JATS (with a few additions) <body>

Borrowed directory from JATS (with a few additions) <body>

<!ELEMENT person-list (title? , person+) > Addition: <person-list>

<!ELEMENT person-list (title? , person+) > Addition: <person-list>

Person-list vs. Person-group

Person-list vs. Person-group

 advisory-board: A board appointed to advise the editorial board editor: Content editors editorial-board:

advisory-board: A board appointed to advise the editorial board editor: Content editors editorial-board: A group of editors on a publication guest-editor: Content editors that have been invited to edit all or part of a work reviewer: Content reviewer transed: Editors of a translated version of a work @person-list-type

 Not required – suggested list Not controlled attribute Only used when content-type=“general-info” Intent

Not required – suggested list Not controlled attribute Only used when content-type=“general-info” Intent was to give meaning for searching and grouping purposes. Used similarly to JATS’ @sec-types @sec-type

@sec-type is not a required or controlled attribute. However, when "general-info" is the @content-type

@sec-type is not a required or controlled attribute. However, when "general-info" is the @content-type of the document, the following is a suggested list of types: association* copyright journal-contact journal-philosophy subscription-info *This refers to associations which may be affiliated with a journal but does not necessarily publish the journal. List of @sec-types

 http: //dtd. nlm. nih. gov/ncbi/pmc/journalmatter/ DTD Documentation

http: //dtd. nlm. nih. gov/ncbi/pmc/journalmatter/ DTD Documentation

? So how’s it all going to look?

? So how’s it all going to look?

 Still relatively untested No rendering No actual use Lack of an existing model

Still relatively untested No rendering No actual use Lack of an existing model Based on perceived needs of PMC as an archive. Unanticipated uses beyond. Different naming conventions and structures of published journal front matter Limitations

 Trying to start a conversation Looking for ways to best capture to suit

Trying to start a conversation Looking for ways to best capture to suit needs both inside PMC and the broader JATS community Determining whether the content types will be applicable for future applications Initiating the usage for the DTD and seeing what happens Looking Forward

 Breena Krick Jeff Beck Audrey Hamelers Christopher Maloney PMC Journal Managers Acknowledgements

Breena Krick Jeff Beck Audrey Hamelers Christopher Maloney PMC Journal Managers Acknowledgements

 Andrew N. . The Oxford Journals Online Archives: The Purpose and Practicalities of

Andrew N. . The Oxford Journals Online Archives: The Purpose and Practicalities of a Major Digitization Program. Serials Review. (2006. June). 32(12), 78 -80. Holdsworth David. Preservation Strategies for Digital Libraries. Glasgow, UK: HATII, University of Glasgow; DCC Digital Curation Manual. (2007. November). Retrieved from: http: //www. dcc. ac. uk/resource/curation-manual/chapters/preservationstrategies-digital-libraries. Marcum D. Scholars as Partners in Digital Preservation. CLIR Issues. (2001. March/April)20. Retrieved from: http: //www. clir. org/pubs/issues 20. html. Markantonatos N. Article vs Issue XML: Capturing the Table of Contents under the NLM DTD. Bethesda, MD: National Center for Biotechnology Information; Journal Article Tag Suite Conference (JATS-Con) Proceedings 2011. (2011). Retrieved from: http: //www. ncbi. nlm. nih. gov/books/NBK 57236/. . Wheeler B. Journal Identity in the Digital Age. Journal of Scholarly Publishing. (2010. ) 42(1), 45 -88. NLM Journal Archiving and Interchange Tag Suite. Retrieved from: http: //dtd. nlm. nih. gov/. PMC Journal Matter DTD Documentation. Retrieved from: http: //dtd. nlm. nih. gov/ncbi/pmc/journalmatter/. BMC Cancer. Retrieved from: http: //www. biomedcentral. com/bmccancer/. Frontiers in Cancer Genetics. Retrieved from: http: //www. frontiersin. org/cancer_genetics. References

 pmc@ncbi. nlm. nih. gov Contact us

pmc@ncbi. nlm. nih. gov Contact us

Questions?

Questions?

1 XML document: content-type= “standing” OR “issue” 2 document: 1 content-type=“standing 1 content-type=“issue” Cover

1 XML document: content-type= “standing” OR “issue” 2 document: 1 content-type=“standing 1 content-type=“issue” Cover “standing” “issue” “cover” Editorial Board General Publisher Information Multiple documents: Journal Information for Authors Dependent on Information “edboard” “general-info” information being “publisher” “info-forcaptured authors” “publisher” “info-forauthors”