Introduction to CNIDRs Isite Jim Fullton MCNCCNIDR Archie

  • Slides: 17
Download presentation
Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises MCNC/CNIDR

Introduction to CNIDR’s Isite Jim Fullton - MCNC/CNIDR Archie Warnock - A/WWW Enterprises MCNC/CNIDR & A/WWW Enterprises

What is Isite? u. A freely available implementation of the Z 39. 50 search/retrieval

What is Isite? u. A freely available implementation of the Z 39. 50 search/retrieval protocol u It includes a Unix-based server, a WWW gateway, a command-line client and a sophisticated text search engine u u ftp: //ftp. cnidr. org/pub/NIDR. tools/Isite http: //vinca. cnidr. org/software/Isite. html MCNC/CNIDR & A/WWW Enterprises

What is Isearch? u Isearch is the successor to free. WAIS u Isearch is

What is Isearch? u Isearch is the successor to free. WAIS u Isearch is a sophisticated full-text search and retrieval system u Isearch is a component of Isite, an implementation of the NISO standard protocol Z 39. 50 for information search and retrieval u ftp: //ftp. cnidr. org/pub/NIDR. tools/Isearch u http: //vinca. cnidr. org/software/Isearch. html MCNC/CNIDR & A/WWW Enterprises

System Components - I u Iindex, the Text Indexer - builds searchable version of

System Components - I u Iindex, the Text Indexer - builds searchable version of the document collection F Implements fast word-based searching F Document parser - recognize start/end of individual documents F Field parser - recognize start/end of fields within individual documents MCNC/CNIDR & A/WWW Enterprises

System Components - II u Isearch, the Search engine - searches a document collection

System Components - II u Isearch, the Search engine - searches a document collection based on user-supplied query F Command 4 Primarily F WWW used for testing gateway (using CGI) 4 End-user F Z 39. 50 line search interface using forms gateway MCNC/CNIDR & A/WWW Enterprises

Isearch Capabilities u Fast full-text search F US AIDS Patent Collection - can search

Isearch Capabilities u Fast full-text search F US AIDS Patent Collection - can search ~250, 000 patents in < 1 second u Fielded search F Can restrict searches to title, author, abstract, other fields u Relevance F Search ranking “hits” are assigned scores & sorted MCNC/CNIDR & A/WWW Enterprises

Isearch Capabilities u Word truncation F search for “matri*” matches “matrix” and “matrices” u

Isearch Capabilities u Word truncation F search for “matri*” matches “matrix” and “matrices” u Boolean functions F AND, OR and ANDNOT combinations of different fields u Customized presentation of results u Phrase searching (coming soon) MCNC/CNIDR & A/WWW Enterprises

Isearch Customization u What’s needed to customize Isearch? F Isearch is written in C++

Isearch Customization u What’s needed to customize Isearch? F Isearch is written in C++ F Documents are C++ objects - data & procedures 4 Already have SGML & HTML, among others F Object technology allows code reusability, customizing only where differences from existing objects occur MCNC/CNIDR & A/WWW Enterprises

Isearch Customization u What’s needed to make arbitrary documents searchable? F Code to parse

Isearch Customization u What’s needed to make arbitrary documents searchable? F Code to parse documents F Code to parse fields F Code to build brief and full result records F Yes, it requires programming F But, many of these are derived from existing procedures MCNC/CNIDR & A/WWW Enterprises

Introduction to Z 39. 50 u Developed for search and retrieval u Networked, client/server

Introduction to Z 39. 50 u Developed for search and retrieval u Networked, client/server environment u Tested by working information scientists (Z 39. 50 Implementor’s Group) u Commerical & public domain support (Isite from CNIDR) u http: //www. ds. internic. net/z 3950. html MCNC/CNIDR & A/WWW Enterprises

Attribute Sets u Attributes define how the query is specified F Use: field names

Attribute Sets u Attributes define how the query is specified F Use: field names F Relation: comparisons F Position: location in field F Structure: word/phrase/key/etc F Truncation: left/right/none/etc F Completeness: subfield/field MCNC/CNIDR & A/WWW Enterprises

Attributes & Element Sets u Supported F BIB-1 Attribute Sets GILS GEO F STAS

Attributes & Element Sets u Supported F BIB-1 Attribute Sets GILS GEO F STAS u Element Sets define retrievable sets of use attributes F Brief record F Full record F Summary record (GEO) MCNC/CNIDR & A/WWW Enterprises

Record Syntaxes u Z 39. 50 allows specification of a “Preferred Record Syntax” for

Record Syntaxes u Z 39. 50 allows specification of a “Preferred Record Syntax” for results F SUTRS (unstructured text) F HTML F USMARC F GRS-1 (tagged, generalized syntax) MCNC/CNIDR & A/WWW Enterprises

Profiles - GEO and Otherwise u Profiles define allowed attributes and element sets u

Profiles - GEO and Otherwise u Profiles define allowed attributes and element sets u Usually domain specific - ATS-1, GILS, WAIS, GEO, Digital Collections, Museum Collections u Supported by external agreement between client & server (currently) F i. e. , a GEO client talks to a GEO server MCNC/CNIDR & A/WWW Enterprises

FGDC Enhancements u Search Engine (Iindex/Isearch) F Field types (text, numeric, date, others) F

FGDC Enhancements u Search Engine (Iindex/Isearch) F Field types (text, numeric, date, others) F Search in nested fields F Search in numeric fields F Date & Date Range Searching F Spatial Searching MCNC/CNIDR & A/WWW Enterprises

FGDC Enhancements u Z 39. 50 Implementation (ZDist) F Support for GEO attributes &

FGDC Enhancements u Z 39. 50 Implementation (ZDist) F Support for GEO attributes & element sets F GRS-1 record syntax F Support for additional (non-Isearch) search engines F Syntax to support nested query MCNC/CNIDR & A/WWW Enterprises

Outstanding Issues u User Interface F What fields are searchable and how does the

Outstanding Issues u User Interface F What fields are searchable and how does the user indicate them? F How complex can the geographic queries be? Bounding box only? Complex regions? MCNC/CNIDR & A/WWW Enterprises