Metadata Matters Exploring the hidden world of metadata

  • Slides: 64
Download presentation
Metadata Matters Exploring the hidden world of metadata

Metadata Matters Exploring the hidden world of metadata

Valuable Content Transformed • • Document Digitization XML and HTML Conversion e. Book Production

Valuable Content Transformed • • Document Digitization XML and HTML Conversion e. Book Production Hosted Solutions Big Data Automation Conversion Management Editorial Services Harmonizer www. dclab. com Confidential & Proprietary 2

Experience the DCL Difference DCL blends years of conversion experience with cutting-edge technology and

Experience the DCL Difference DCL blends years of conversion experience with cutting-edge technology and the infrastructure to make the process easy and efficient. • • World-Class Services Leading-Edge Technology Unparalleled Infrastructure US-Based Management Complex-Content Expertise 24/7 Online Project Tracking Automated Quality Control Global Capabilities www. dclab. com Confidential & Proprietary 3

We Serve a Very Broad Client Base. . . www. dclab. com Confidential &

We Serve a Very Broad Client Base. . . www. dclab. com Confidential & Proprietary 4

. . . Spanning All Industries • • • Aerospace Associations Defense Distribution Education

. . . Spanning All Industries • • • Aerospace Associations Defense Distribution Education Financial Government Libraries Life Sciences Manufacturing Medical Museums • • • Periodicals Professional Publishing Reference Research Societies Software STM Technology Telecommunications Universities Utilities www. dclab. com Confidential & Proprietary 5

About your presenter • Rob Hanna, ECMs • President of Precision Content Authoring Solutions

About your presenter • Rob Hanna, ECMs • President of Precision Content Authoring Solutions Inc. and a director of AIIM First Canadian Chapter • Expert in structured authoring and content management practices and technology • Instructor at the University of Toronto School of Continuing Studies – Metadata and Controlled Vocabularies 6

Who is Precision Content Authoring Solutions Inc. ? • We help organizations make their

Who is Precision Content Authoring Solutions Inc. ? • We help organizations make their information easier to use • Our solutions consist of • • • Content strategy Detailed information architecture Content lifecycle design and development Turn-key content transformation Tools selection and development Multi-channel publishing • www. precisioncontent. com © 2015 Precision Content Authoring Solutions Inc.

WE ARE YOUR CONTENT TRANSFORMATIO N SPECIALISTS

WE ARE YOUR CONTENT TRANSFORMATIO N SPECIALISTS

Before any technology is considered, organizations must first consider their content strategy to fully

Before any technology is considered, organizations must first consider their content strategy to fully understand what they need and how they are going to get there. Several factors must be examined. @2015 Precision Content Authoring Solutions Inc. Importance of your Content Strategy

What is Metadata? And how does it relate to content?

What is Metadata? And how does it relate to content?

What is Content? Data Information Knowledge Content

What is Content? Data Information Knowledge Content

Metadata Defined • Coined in the 1960’s by Jack Myers • Data about Data

Metadata Defined • Coined in the 1960’s by Jack Myers • Data about Data • Stuff about Stuff • Essential properties stored within the content or external to the content that identify and define context, history, and management of the content

Metadata is information about a resource

Metadata is information about a resource

Application of Metadata • Metadata is • • • applied to all structured and

Application of Metadata • Metadata is • • • applied to all structured and unstructured content in a corpus visible to the user or it can be hidden from view both machine-driven and manually entered internal or external to the content mandatory, optional, or conditional

Many forms of Metadata • Corporate metadata is structured data about content • Metadata

Many forms of Metadata • Corporate metadata is structured data about content • Metadata is relational or hierarchical • Metadata may take the form of • • Rich-text or binary Plain-text Controlled values/pick-lists/lookup values Syntax encoded values • • • date/time (e. g. , yyyy-mm-dd hh: mm: ss) financial ($0. 00, -$0. 00) numeric - integer/floating values (#, ###) boolean (true/false) special (phone numbers, postal codes, or social insurance numbers)

Many roles of Metadata • The primary role of metadata is to facilitate the

Many roles of Metadata • The primary role of metadata is to facilitate the identification, retrieval, and processing of content in any media. • Secondarily, metadata may also • appear as content to the content consumer, and • serve as corporate structured data for analysis and business intelligence.

Metadata is the soup can Content is the soup 27 -Nov-20 © 2015 Precision

Metadata is the soup can Content is the soup 27 -Nov-20 © 2015 Precision Content Authoring Solutions Inc. 17

Metadata isn’t the message • Twitter post (118 chars) • Twitter status message metadata

Metadata isn’t the message • Twitter post (118 chars) • Twitter status message metadata (1, 938 chars) An early look at Annotations: http: //groups. google. com/group/twitter-apiannounce/browse_thread/fa 5 da 2608865453 {"id"=>12296272736 "text"=> "An early look at Annotations: http: //groups. google. com/group/twitter-api-announce/browse_thread/fa 5 da 2608865453", "created at"=>"Fri Apr 16 17: 55: 46 +0000 2010", "in_reply_to_user_id"=>nil, "in_reply_to_screen_name"=>nil, "in_reply_to_status_id"=>nil, "favorited"=>false, "truncated"=>false, "user"=> {"id"=>6253282, "screen_name"=>"twitterapi" "name"=>"Twitter API", "description"=> "The Real Twitter API. I tweet about API changes, service issues and happily answer questions about Twitter and our API. Don't qet an answer? It's on my website. ", "url"=>"http: //apiwiki. twitter. com", "location"=>"San Francisco, CA", "profile_background_color"=>"cldfee", "profile_background_image_url"=> "http: //a 3. twimg. com/profile_background_images/59931895/twitterapi-background-new. png ", "profile_background_tile"=>false, "profile_image_url"=>"http: //a 3. twimg. com/profile_images/689684365/api_normal. png", "profile_link_color"=>"0000 ff", "profile_sidebar_border_color"=>"87 bc 44", "profile_sidebar_fill_color"=>"e 0 ff 92", "profile_text_color"=>"000000", "created_at"=>"Wed May 23 06: 01: 13 +0000 2007", "contributors_enabled"=>true, "favourites_count"=>1 "statuses_count"=>1628 "friends_count"=>13 "time_zone"=>"Pacific Time (US & Canada)", "utc_offset"=>-28800, "lang"=>"en", "protected"=>false, "followers_count"=>100581, "geo_enabled"=>true, "notifications"=>false, "following"=>true "verified"=>true} "contributors"=>[3191321] "geo"=>nil "coordinates"=>nil "place"=> {"id"=>"2 b 6 ff 8 c 22 edd 9576", "url"=>"http : //api. twitter. com/1/geo/id/2 b 6 ff 8 c 22 ed 9576. json", "name">"So. Ma", "full_name"=>"So. Ma, San Francisco", "place_type"=>"neighborhood", "country_code"=>"US", "country "=>"The United States of America", "bounding_box"=> {"coordinates"=> [[[-122. 42284884, 37. 76893497], [-122. 3964, 37. 78752897], [-122. 42284884, 37. 78752897]]], "type"=>"Polygon"}}, "source"=> "web"}

Why Metadata matters Collection and use of metadata has been known to be controversial

Why Metadata matters Collection and use of metadata has been known to be controversial when viewed out of context of the content it carries. Electronic Frontier Foundation 30 December 2013 • They know you rang a phone sex service at 2: 24 am and spoke for 18 minutes. But they don’t know what you talked about. • They know you called the suicide prevention hotline from the Golden Gate Bridge. But the topic of the call remains a secret. • They know you spoke with an HIV testing service, then your doctor, then your health insurance company in the same hour. But they don’t know what was discussed

Types of Metadata Library of Congress states that metadata consists of • Descriptive Metadata

Types of Metadata Library of Congress states that metadata consists of • Descriptive Metadata • Administrative Metadata, and • Structural Metadata 20

Descriptive Metadata And how it is applied through classification

Descriptive Metadata And how it is applied through classification

Thinking about Classification • Classification is the ordering of entities (things or concepts) into

Thinking about Classification • Classification is the ordering of entities (things or concepts) into groups or classes on the basis of their similarity • an activity that we do everyday • metadata and controlled vocabularies are tools that can be used for classification

Thinking about Classification How many words can you memorize in 20 seconds? analyst brake

Thinking about Classification How many words can you memorize in 20 seconds? analyst brake market stapler seat investor calculators scissors engine pedal dashboard pen backers marker tape profit starter ruler prospects traders alternator

Survey question #1 What level of structured authoring are you using in your enterprise

Survey question #1 What level of structured authoring are you using in your enterprise today?

Thinking about Classification 1. Filter out all of the noise brake analyst calculator seat

Thinking about Classification 1. Filter out all of the noise brake analyst calculator seat tape scissors trader dashboard profit market engine alternator pen starter stapler pedal investor backer marker ruler prospect

Thinking about Classification 2. Break into smaller groupings analyst brake backer calculator dashboard seat

Thinking about Classification 2. Break into smaller groupings analyst brake backer calculator dashboard seat scissors pen engine investor marker profit market starter tape ruler prospect alternator stapler pedal trader

Thinking about Classification 3. Organize words by similarities analyst market backer tape scissors prospect

Thinking about Classification 3. Organize words by similarities analyst market backer tape scissors prospect profit trader pen ruler investor stapler calculator marker alternator pedal dashboard brake engine seat starter

Thinking about Classification 4. Classify and label groups Stock market Office supplies analyst market

Thinking about Classification 4. Classify and label groups Stock market Office supplies analyst market backer tape scissors prospect profit trader pen ruler investor stapler calculator marker Car parts alternator pedal dashboard brake engine seat starter

Thinking about Classification How well did you do? Stock market Office supplies Car parts

Thinking about Classification How well did you do? Stock market Office supplies Car parts analyst stapler brake market calculator seat trader scissors dashboard investor pen engine backer marker alternator profit tape starter prospect ruler pedal

Thinking about Classification Now how many words can you memorize in 20 seconds? Vegetables

Thinking about Classification Now how many words can you memorize in 20 seconds? Vegetables Computer parts Instruments peas hard drive violin endive sound card harp carrots monitor piano spinach mouse trumpet celery processor cello broccoli flash drive flute tomato keyboard guitar

Survey Question #2 At what stage are you in with your current content project?

Survey Question #2 At what stage are you in with your current content project?

Controlled vocabularies • Some metadata requires a classification, controlled list of values or terms

Controlled vocabularies • Some metadata requires a classification, controlled list of values or terms to define it, for example: • Film rating: G, PG, 14 A, 18 A, R, A • Ebay seller location: • Control is exercised over modifications to the list

Controlled vocabularies • Controlled vocabularies defined • A list of terms • All terms

Controlled vocabularies • Controlled vocabularies defined • A list of terms • All terms in a controlled vocabulary must have an unambiguous, nonredundant definition. (Source: ANSI/NISO Z 39. 19 -2005)

Bridging boundaries which term is “right”? Accessible parking spaces Disabled permit parking Accessible permit

Bridging boundaries which term is “right”? Accessible parking spaces Disabled permit parking Accessible permit parking Handicapped parking Disabled parking spaces Designated disabled parking spaces

Towards a common vocabulary Accessible parking spaces Disabled permit parking Accessible permit parking Handicapped

Towards a common vocabulary Accessible parking spaces Disabled permit parking Accessible permit parking Handicapped parking Disabled parking spaces Designated disabled parking spaces

Managing Controlled Vocabularies

Managing Controlled Vocabularies

Types of Classification Schemes • Subject • Identify content topics • Organization Structure •

Types of Classification Schemes • Subject • Identify content topics • Organization Structure • Depicts business units • Functional • Defined by business processes

Subject Taxonomies • Describes the topic of the resource • Structured from broad to

Subject Taxonomies • Describes the topic of the resource • Structured from broad to narrow / general to specific • Often stable over time

Subject Classification Source: http: //popchartlab. com/products/the-very-many-varieties-of-beer

Subject Classification Source: http: //popchartlab. com/products/the-very-many-varieties-of-beer

Organization Classification • Shows business unit relationships • Can be used to identify: •

Organization Classification • Shows business unit relationships • Can be used to identify: • Ownership of content • Maintenance responsibilities • A person’s place in the organization • Often change frequently

Organizational Classification

Organizational Classification

Functional Classification • Describes the breakdown of business processes • Function – Activity -

Functional Classification • Describes the breakdown of business processes • Function – Activity - Task • Stable in nature unless new processes or functions are introduced

Functional Classification Source: http: //www. iskouk. org/conf 2009/papers/mil ne_ISKOUK 2009. pdf

Functional Classification Source: http: //www. iskouk. org/conf 2009/papers/mil ne_ISKOUK 2009. pdf

Taxonomies • Types of taxonomies • • • Lists Trees Hierarchies and polyhierarchies Matricies,

Taxonomies • Types of taxonomies • • • Lists Trees Hierarchies and polyhierarchies Matricies, and System maps

Taxonomy Types • List style taxonomy

Taxonomy Types • List style taxonomy

Taxonomy Types • Simple tree style taxonomy

Taxonomy Types • Simple tree style taxonomy

Taxonomy Types • Classical hierarchical style taxonomy

Taxonomy Types • Classical hierarchical style taxonomy

Taxonomy Types • Polyhierarchical style taxonomy

Taxonomy Types • Polyhierarchical style taxonomy

Taxonomy Types • Matrix style taxonomy • With 3 facets

Taxonomy Types • Matrix style taxonomy • With 3 facets

Taxonomy Types • System map style taxonomy

Taxonomy Types • System map style taxonomy

Administrative Metadata For managing the content

Administrative Metadata For managing the content

Administrative metadata • Information about the metadata record itself – its creation, modification, relationship

Administrative metadata • Information about the metadata record itself – its creation, modification, relationship to other records, etc. • Audit trails may capture the date and time when a file’s title was changed. • Common subsets of administrative metadata are: • Rights Management: metadata that deals with intellectual property rights • Preservation: information needed to archive / preserve a resource Source: Understanding Metadata – NISO 2004

Separation of Status Metadata • Much of the administrative metadata is applied automatically by

Separation of Status Metadata • Much of the administrative metadata is applied automatically by the system • Other administrative metadata may live with the workflow rather than the record itself 27 -Nov-20 © 2015 Precision Content Authoring Solutions Inc. 53

Structural Metadata Defining the structure of a resource

Structural Metadata Defining the structure of a resource

About Structural Metadata • Describe the structure of a resource • Book • Document

About Structural Metadata • Describe the structure of a resource • Book • Document • Website • Table of contents • Site map • Internal structure

What is XML? • (e. Xtensible Markup Language) is an open standard for the

What is XML? • (e. Xtensible Markup Language) is an open standard for the exchange of information • first published in 1996 by W 3 C • to encode electronic documents readable by • human, and • machine • for a multitude of applications ranging from • corporate financial reporting applications, to • Microsoft Word

XML is Everywhere XML defines meaningful data structures for documents and data. It is

XML is Everywhere XML defines meaningful data structures for documents and data. It is a human-readable file format used to power • manufacturing assembly lines • medical devices • military applications, and • many other things. XML is the language of the Web. It enables smart phones and web browsers. 57

What are markup languages? • pre-date desktop publishing and the Internet • tell computers

What are markup languages? • pre-date desktop publishing and the Internet • tell computers how to handle data • such as how to render electronic content on a page • categorized as either • presentation, or • semantic markup

Presentation markup • With electronic presentation markup, we markup the paragraph and italicize the

Presentation markup • With electronic presentation markup, we markup the paragraph and italicize the citation for publication • This is typical of web pages using hypertext markup (HTML) <p><i>The Cancer Journal: The Journal of Principles & Practice of Oncology</i> provides an integrated view ofview modern of oncology modern oncology across all disciplines. across <i>all</i> disciplines. </p> The Cancer Journal: The Journal of Principles & Practice of Oncology provides an integrated view of modern oncology across all disciplines.

Semantic markup • With semantic markup, we markup the content to describe the meaning

Semantic markup • With semantic markup, we markup the content to describe the meaning of the text • Publishing stylesheets interpret the meaning from the markup and apply appropriate styles specific to the publishing context <intro><cite>The Cancer Journal: Cancer The Journal: of The Principles Journal of & Practice Principles of& Oncology of Practice provides Oncology</cite> an integrated provides view of an modern integrated oncology view across of all disciplines. modern oncology across <em>all</em> disciplines. </intro> The Cancer Journal: The Journal of Principles & Practice of Oncology all disciplines. provides an integrated view of modern oncology across all disciplines.

Semantic markup • Using semantic markup, we can • disambiguate content • search based

Semantic markup • Using semantic markup, we can • disambiguate content • search based on meaning • connect to other content, and • reuse or substitute new text.

Multi-Channel publishing • Supports complex, multi-channel publishing to many common output formats • Add

Multi-Channel publishing • Supports complex, multi-channel publishing to many common output formats • Add new formats or styles easily ?

Intelligent Content • Content that is • not limited to one • purpose •

Intelligent Content • Content that is • not limited to one • purpose • technology, or • output • structurally rich and semantically aware, making it • • discoverable reusable reconfigurable, and adaptable.

Questions? Rob Hanna, ECMs +1 (289) 290 -4337 www. linkedin. com/in/singlesourceror rob@precisioncontent. com

Questions? Rob Hanna, ECMs +1 (289) 290 -4337 www. linkedin. com/in/singlesourceror rob@precisioncontent. com