LIS 618 lecture 1 Thomas Krichel 2004 02

  • Slides: 35
Download presentation
LIS 618 lecture 1 Thomas Krichel 2004 -02 -01

LIS 618 lecture 1 Thomas Krichel 2004 -02 -01

structure of talk • Recap on Boolean (aurally) • Before online searching • Working

structure of talk • Recap on Boolean (aurally) • Before online searching • Working with DIALOG – Overview – Search command • Boolean exercise (on the fly)

before a search I • What is the purpose of the query? – brief

before a search I • What is the purpose of the query? – brief overview – comprehensive search • What perspective on the topic is required? – scholarly – technical – business – popular

before search II • What type of information does the patron want? – –

before search II • What type of information does the patron want? – – fulltext bibliographic directory numeric • Are there any known sources? – – authors journals papers conferences

before search • • III What are the language restrictions? What, if any, are

before search • • III What are the language restrictions? What, if any, are the cost restrictions? How current need the data to be? How much of each record is required?

concept analysis • This is the art/science of taking the topic to search for

concept analysis • This is the art/science of taking the topic to search for and develop facets. Example “Internet filtering in Libraries” – Internet filter – Libraries – Controversy not technical issues • We may also need the think about the aim of the search.

search aims • a known needle in a known haystack • a known needle

search aims • a known needle in a known haystack • a known needle in an unknown haystack • an unknown needle in an unknown haystack • any needle in a haystack • the sharpest needle in a haystack • most of the sharpest needles in a haystack

search aims • • • all the needles in a haystack affirmation of no

search aims • • • all the needles in a haystack affirmation of no needles in a haystack things like needles in a haystack is there a new needle in the haystack where are the haystacks needles, haystacks, anything

types of searches • • • known-item searches negative searches selective dissemination of information

types of searches • • • known-item searches negative searches selective dissemination of information topical or subject searches passage searching, where the user is only interested in part of the item

search strategies I • Building block approach – Do a number of elementary searches

search strategies I • Building block approach – Do a number of elementary searches – Combine the resulting sets with Boolean operators • This is what I did in the example in the previous lecture • Works only with the Boolean model

search strategies II • Snowballing approach – Start with a very specific query –

search strategies II • Snowballing approach – Start with a very specific query – Think of other term that can be added to get more results – Stop when a reasonable number of results are achieved. • Not sure this really works well in practice.

search strategies III • The successive fraction approach is the opposite of the snowballing

search strategies III • The successive fraction approach is the opposite of the snowballing approach – First search for a broad concept – Then repeat the query by adding various limiting factors. • Can work well if the IR system allows to repeat and edit queries. • But queries can become unwieldy.

search strategies IV • Most specific facet first – Conduct concept analysis – Look

search strategies IV • Most specific facet first – Conduct concept analysis – Look for the most specific facet – Search that first, add others later • Presupposes that you have done a decent concept analysis.

two steps in DIALOG • step one: select databases (aka files) to look at

two steps in DIALOG • step one: select databases (aka files) to look at • step two: perform searches on the selected databases • You may wonder why one does not have one single step like in a search engine. Discuss. • today we concentrate on the second step

working on selected files • We assume that we have selected database that we

working on selected files • We assume that we have selected database that we know and we look at the search interface on the selected database. • The database selection process is a bit more complicated, covered next week. • First, let us login and look at the command prompt. • Then we select the first database (file) with the begin command

the ‘begin’ command • As its name suggests, usually the first command. • begin

the ‘begin’ command • As its name suggests, usually the first command. • begin number, … • selects files with numbers number • Once they are selected they can be searched. • Now select the ERIC "begin 1" • "Begin 1" can be abbreviated as "b 1"

substeps in the second step • Identify search terms • Use Dialog basic commands

substeps in the second step • Identify search terms • Use Dialog basic commands to conduct a search • View records online or print the results

the 's' (select) command • Once issued the "begin" command to select a database,

the 's' (select) command • Once issued the "begin" command to select a database, we issue the "s" command on the database. • "s query_expression" where query_expression is a query expression. • This will search the index of selected database in full-text view for the query issued • It will not find any of the following: "an and by for from of the to with". They are stop words.

query expression • A query expression contains search terms expressed in special ways –

query expression • A query expression contains search terms expressed in special ways – You can truncate search terms. – You can build an elementary expression by putting several keywords together. This is achieved by DIALOG's connectors. – You can combine several expressions with the use of Boolean operators • We will cover this is in turn now.

truncation of terms I • Open Truncation – "select path? " retrieves all words

truncation of terms I • Open Truncation – "select path? " retrieves all words that begin with path: paths, pathos, pathway, pathology • Controlled-Length Truncation – "select path? ? " retrieves the root and up to two additional characters: paths, pathos

truncation of terms II • Embedded Character truncation can be used for variant spellings:

truncation of terms II • Embedded Character truncation can be used for variant spellings: – "select organi? ation" -> organization organisation – "select fib? ? board" -> fiberboard fibreboard • This truncation feature is also useful for searching for unusual plural forms: – "select wom? n" -> woman women • Apparently you can also do prefixes by putting the ? in the beginning. – "? mobile" -> automobile metamobile

use of connectors • Connectors are used to put several words together. • One

use of connectors • Connectors are used to put several words together. • One instance where this is useful is when you have words that on their own mean different things. • For example "mate" is a herbal beverage consumed in South America. Looking for mate on the Internet retrieves a lot of singles' pages.

example: terms related to "mate" What other terms to be used? – matear –

example: terms related to "mate" What other terms to be used? – matear – matero – cebar – cebador – yerba – bombilla (drink mate) (mate drinker) (prepare mate) (mate preparer) (mate herb) (mate straw)

connectors I • '(W)' requires terms to appear one after the other next to

connectors I • '(W)' requires terms to appear one after the other next to each other e. g. 'yerba(W)mate? ' matches "yerba mate". • '(i W)' where i is an integer, means followed by at most i words, e. g. 'ceba? (3 W)mate? ' matches "cebar un maravilloso mate" but not "cebador guapo mirando un buen mate"

connectors II • '(N)' requires terms to be next to each other e. g.

connectors II • '(N)' requires terms to be next to each other e. g. 'yerba(N)mate? ' matches "yerba mate" or "mate yerba". • '(i N)' where i is an integer, means proximity by at most i words, e. g. 'ceba? (3 N)mate? ' matches "cebar mate" or "matear con la cebadora". • '(S)' searches for the occurrence of connected terms in the same paragraph.

using Boolean operators • In your query, you can combine several expressions with Boolean

using Boolean operators • In your query, you can combine several expressions with Boolean operators • Example: "S LIBRARY(W)SCHOOL? AND DISTANCE(W)EDUCATION" • But I usually do not issue such fancy queries.

executing several searches • There can be several searches done sequentially, and the results

executing several searches • There can be several searches done sequentially, and the results sets are saved by the system. • Each time the system assigns a set number, Si, • These can be combined in Boolean expressions, e. g. 's S 1 or S 2 and S 3' • Remember that Boolean operations are set-theoretic!

Boolean operators on sets • When using Booleans, be aware that "and" has higher

Boolean operators on sets • When using Booleans, be aware that "and" has higher precedence than "or". • Thus: a or b and c is not the same as (a or b) and c but it is a or (b and c) • Use parenthesis when in doubt

DS (display sets) • This command can be executed any time to review the

DS (display sets) • This command can be executed any time to review the sets that have been formed since the last B (begin) command. • This can be useful to review your search history.

the target command • "target set" where set is a search result set creates

the target command • "target set" where set is a search result set creates a subset of the "statistically most relevant results" in the original set. • I have not seen details about how this subset is computed. • A new result set is being formed.

display: the type command type set/format/range • set is a result set • format

display: the type command type set/format/range • set is a result set • format is a format • range can be – start – end • start is a record number to start • end is a record number to end – all

standard delivery formats • • • 2 -- full record except abstract 3 or

standard delivery formats • • • 2 -- full record except abstract 3 or medium – citation 5 or long – full except full text 6 or free – title and dialog number 8 or short – title plus indexing terms – useful to find other indexing terms • 9 or full – everything • KWIC or K – keywords in context

options for delivery • I once tried to email results to me, to no

options for delivery • I once tried to email results to me, to no avail • You can save the html of the search results in the browser. • You can print the results within the browser.

http: //openlib. org/home/krichel Thank you for your attention!

http: //openlib. org/home/krichel Thank you for your attention!

 • to do: set up consistent notation

• to do: set up consistent notation