Part 3 A2 Document Subject Analysis Documents Subjects
Part 3 A-2: Document & Subject Analysis • Documents • Subjects • Facets
Document (content) analysis… This is the most general form of analysis There are few restrictions on what you can represent as content and Taylor helps us understand the diversity of what we can call content: Topics (what’s discussed and communicated? ) Names (who’s mentioned? ) Time (is a time period or point in time mentioned? ) Form (is there content communicated because there’s a form – inventory chart, annual report, white paper)
Subject analysis… More specific than content analysis Here we want to know specifically what the document is about. What is the subject of the document? Wilson talks to us about this. What does he say in Chapter 5?
Subject analysis as a contextual process… Context considerations include: Users (patients vs. medical practitioners, etc. ) Uses (developing egg substitutes, learning how to cook) The document itself (the “text” of a document, intended audience, uses, etc. ) Institution (public library, corporate intranet etc. ) Information systems context
The notion of “warrant”… Warrant is “the authority that is used to justify decisions about what is included in a system, ” (Clare Beghtol) Types of warrant: Literary warrant User warrant Scholarly warrant Cultural warrant Structural warrant
Literary and user warrant… Literary warrant terms or organization reflect or are taken directly from resources themselves; this includes dictionaries, encyclopedias, etc. on a topic User (aka use, enquiry) warrant terms or organization reflect use; user terminology may (or may not) be taken directly from logs of system use or from personal interactions with users
Scholarly, cultural & structural warrant… Scholarly Warrant Cultural Warrant terms or organization reflect the opinions of a panel of human experts terms or organization derived from cultural practice or understanding; for example, Dewey and LCSH reflect American/Western cultural bias; Colon Classification reflects Indian/Eastern cultural bias (this also can be partly a function of literary warrant…) Structural warrant indexing terms or organization made up or improvised to provide for subject collocation; used particularly in classification schemes
Structural warrant… Without structural warrant: Vaults brick glass plastic steel stone tile With structural warrant: Vaults Masonry vaults** brick stone tile Non-masonry vaults** glass plastic Steel **Don’t appear in documents
Analysis in terms of concepts and terms… It is best to think of the analysis process as determining appropriate concepts to enable selecting terms. Concept: “A unit of thought” [ISO] At the analysis stage, you are selecting concepts –abstractions, in a sense - that you will represent later using formal subject terms. At this stage, you don’t know whether the concepts you identify will be represented by single or multiple terms—that depends on the vocabulary used that defines available terms
Analysis and indexing… Subject analysis: a technique used to determine the “subject(s)” and disciplinary context exemplified by a document Subject indexing: a technique through which subject terms (words, taxonomic categories, or notation) are added to a document representation to describe the subject content of a document
Patrick Wilson’s method… 1. Identify Cast of Characters 2. Utilize one or all of these methods: Purposive Way Figure-Ground Way Aspects of the document that stand out and their associated background Counting References Way The authorial intent—what was the author’s purpose Mai observes a variant—user’s purpose or need (a form of utility) Noting frequently cited aspects as “important” Appeal to Unity Way Determining the what makes the document coherent—cohesive
Facet Analysis
Facet analysis… • Definition of Facet Analysis: • Breaking down the subject of a document into component parts and locating terms (foci) within them • Facets: the component parts of a subject • Facets represent “fundamental categories” • Fundamental categories are the only types of ideas that can manifest in all subjects (so say the designers of classification schemes and indexes that use such schemes)
Fundamental categories… • Three schools of thought: • Kaiser was the first to propose the use of facets in 1911. He had 3 [1]: – – – • Concretes Processes Place (e. g. , Country) Ranganathan (1950 s-1972) had 5: PMEST – – – Personality Matter (i. e. , Material or Property) Energy Space Time
Fundamental categories… Three schools of thought (cont. ) Other folks said there could be more British thesaurus designers talk about at least 15
Example of facets…for “automobiles” Color Brand Type Transmission Red BMW Sedan Stick Blue Volvo SUV Automatic Green Saab Wagon Yellow Black
Ranganathan and fundamental properties…(PMEST) With subjects of research literature: Personality Matter Are there concepts of a certain action on the main subject of the resource Space Are there a concepts of a certain material or property of the main subject of the resource? Energy What is the substance of the resource? What is the main subject (minus MEST below? ) Are the resources about a certain place? Time Are the resources about a certain time or period of time?
Ranganathan and fundamental properties …(PMEST) With subjects of research literature: Personality Matter Energy Space Time Category Value Time 2006 Space Madras Energy Prevention Matter Disease Personality Rice Plant
Facets reinterpreted… Information architecture and ‘faceted’ OPACs (Endeca for example) have reinterpreted facets. Some call this “relational search” where they break apart all aspects of the data, and group it into ‘categories’ Others consider a facet to be any attribute where the values are under authority control (e. g. , controlled subjects = yes; uncontrolled keywords = no)
Facet analysis… We will learn that this analysis can be quite specific and as a consequence quite powerful for search and display. The more specific we get the more P’s and M’s and E’s and S’s and T’s we get. We can specify very, very long strings of facets
EXAMPLE OF FACETS (WITH NOTATION): • Prevention of Disease in the Rice Plant in Madras in 2006 • • • Time – 2006 (P 06) Space – Madras (4411) Energy – Prevention (5) Matter – Disease (4) Personality – Rice Plant (381) Notation (PMEST order…more later) • 381; 4: 5. 4411’P 06
Deciphering notation… J 381, 4; 4; 0 c 7*: 5; 3*: 7*; 5. 4411. e 50 c’N 67‘el
FACET ANALYSIS • • • • J [BF] 381 [1 P 1] , 4 [1 P 2] ; 4 [1 M 1] ; 0 c 7* [1 M 2] : 5 [1 E] ; 3* [2 M 1] : 7* [2 e] ; 5 [3 M 1]. 4411 [S 1]. e 50 c [s 2] ’N 67 [T 1] ‘el [T 2] • • • • Agriculture - J Rice Plant - 381 Stem - 4 Disease - 4 Virulence - 0 c 7* Prevention - 5 Chemicals - 3* Distribution - 7* Sprayer - 5 Madras - 4411 Cauveri Delta - e 50 c 1967 - N 67 Dry Period - el
End Part 3 A-2: Document & Subject Analysis
- Slides: 27