Indexing Systems Aranilla Jomar Callejo Melissa Ann Cantara

  • Slides: 23
Download presentation
Indexing Systems Aranilla, Jomar Callejo, Melissa Ann Cantara, Ma. Raizza Mendoza, Cristian

Indexing Systems Aranilla, Jomar Callejo, Melissa Ann Cantara, Ma. Raizza Mendoza, Cristian

Coordinate Indexes • Allow terms to be combined or coordinated • Combining two or

Coordinate Indexes • Allow terms to be combined or coordinated • Combining two or more single index terms to create new class • Index terms were presented by notching out holes in known positions around the card. A classic example of coordinate index is the marginal –hole punched card system.

Individual Indexed Terms New Class ØBlack ØPolecats ØTexas Black Polecats of Texas >Pine and

Individual Indexed Terms New Class ØBlack ØPolecats ØTexas Black Polecats of Texas >Pine and Trees Pine Trees

Types of Coordinate Indexes • Pre-coordinate Indexes - the record or document creator designates

Types of Coordinate Indexes • Pre-coordinate Indexes - the record or document creator designates the search terms from an already authoritative source. Library of Congress Subject Headings (LCSH) is one such source. . • In this system documents or searched under the same terms which the indexer originally assigned to them without any furthers manipulation of terms at the time of searching.

Example • · Chain indexing by S. R. Ranganathan · PRESIS – preserved context

Example • · Chain indexing by S. R. Ranganathan · PRESIS – preserved context index system by derrick Austin · POPSI – Postulate based Permuted Subject Indexing by G. Bhattacharya · SLIC – Selective Listing In Combination by J. P. Sharp

Advantages of Pre-Coordinate System • eliminate the need for sophisticated search logic. • It

Advantages of Pre-Coordinate System • eliminate the need for sophisticated search logic. • It requires no special features in their physical format. Almost all printed indexes reflecting pre-coordinate indexing principles, are hard copy. • Its principles are applicable to a limited extent in on-line or off-line searched computer based information retrieval systems.

 • These also have found some application in subject indexes to library catalogues

• These also have found some application in subject indexes to library catalogues and the shelf arrangement of book-stock. These are to be found in abstracting and indexing journals, national bibliographies and indexes to journals • In this single or multiple entry, present certain advantages at the search stage. It is possible for a number of searches to be conducted simultaneously.

Limitations • the multidimensional character at the subject matter is forced into a one-dimensional

Limitations • the multidimensional character at the subject matter is forced into a one-dimensional representations, which then necessitates to repeat the index entry in someway for example by rotation of the terms. • In this system relationships among topics are built once and for all into the system vocabulary or index entries formed from its components by the indexes. There are nonmanipulative. • A multiple access approach is possible, if we enter the document several times in the index by duplicating the citation.

 • These are also criticized on the ground that even the extensive duplication

• These are also criticized on the ground that even the extensive duplication of entries does not provide the true multidimensional retrieval capability to multidimensional subject matter. • . Efficient approaches to information retrieval demand such systems that permit the free “combination” of classes and the terms representing them. • A number of ways have been suggested to provide multiple approach to retrieval in pre-coordinate indexes without complete permulation of index terms.

Post-coordinate Indexes • As the coordination of index terms in done after the index

Post-coordinate Indexes • As the coordination of index terms in done after the index files has been compiled • Examples for post-coordinate indexing system: · Uniterm system of Taube dates about 1951 · Peek- aboo by batter in England cordonnier in France by 1940. · Edge- notched card system by calerin mooers

SIMILARITIES DIFFERENCES The subject content has to be analyzed · In input preparation and

SIMILARITIES DIFFERENCES The subject content has to be analyzed · In input preparation and then, the standardized term has to be identified. In both types, the terms have to be co- Differences in access point ordinated. Both the systems involve the arrangement of the indexed cards in some logical order. Differences in arrangement Differences in browse ability. Differences in search time

Classified Indexes • Has its contents arranged systematically by classes or subject headings. •

Classified Indexes • Has its contents arranged systematically by classes or subject headings. • is arranged in hierarchy of related topics and working down to the specific.

Example of Classified Index • Teaching Children Mathematics. pdf

Example of Classified Index • Teaching Children Mathematics. pdf

Facete Indexes • Faceting is done on indexed rather than stored values. This is

Facete Indexes • Faceting is done on indexed rather than stored values. This is because the primary use for faceting is drill-down into a subset of hits resulting from a query, and so the chosen facet value is used to construct a filter query which literally matches that value in the index. • When an entity posses three or more characteristics or dimension that would require or would benefit for classification

When indexing a document to which categories were added, information on these categories is

When indexing a document to which categories were added, information on these categories is added to the search index, in two locations: • Category Tokens are added to the document for each category attached to that document. These categories can be used at search time for drilldown. • A special Category List Token is added to each document containing information on all the categories that were added to this document. This can be used at search time for facet accumulation

 • Because faceting fields are often specified to serve two purposes, human-readable text

• Because faceting fields are often specified to serve two purposes, human-readable text and drill-down query value, they are frequently indexed differently from fields used for searching and sorting: • They are often not tokenized into separate words • They are often not mapped into lower case • Human-readable punctuation is often not removed (other than double-quotes) • There is often no need to store them, since stored values would look much like indexed values and the faceting mechanism is used for value retrieval.

Example • if I had an "author" field with a list of authors, such

Example • if I had an "author" field with a list of authors, such as: • Schildt, Herbert; Wolpert, Lewis; Davies, P. • I might want to index the same data differently in three different fields 1. For searching: Tokenized, case-folded, punctuation-stripped: • schildt / herbert / wolpert / lewis / davies / p

2. For sorting: Untokenized, case-folded, punctuation-stripped: • schildt herbert wolpert lewis davies p 3.

2. For sorting: Untokenized, case-folded, punctuation-stripped: • schildt herbert wolpert lewis davies p 3. For faceting: Primary author only, using a solr. String. Field: • Schildt, Herbert

Chain Indexes • A chain is a string of terms organized in a particular

Chain Indexes • A chain is a string of terms organized in a particular sequence based on the classification scheme that the chain adapts. The sequence of terms is pre-coordinated. Ranganathan is credited with the invention of chain indexing, an economical system of providing access to the terms in classification schedules without replicating the hierarchical structure of the classification in the alphabetical index.

Chain Indexes • Ranganathan introduced chain indexes as a part of his Colon Classification,

Chain Indexes • Ranganathan introduced chain indexes as a part of his Colon Classification, which became popular in British Libraries (Ranganathan 1933). The argument for chain indexes is that the system uses both the classification terminology and its structure. It moves systematically through the hierarchy and reveals the weaknesses and strengths in the classification structure as one works with it. • For example, a user can readily see subordinate classes that are not truly species of a genus. It draws together the many aspects of a subject that are otherwise dispersed in a classification scheme.