Search Engine Interfaces search engine modus operandi The

  • Slides: 12
Download presentation
Search Engine Interfaces search engine modus operandi

Search Engine Interfaces search engine modus operandi

The basics: what’s a search engine? ü Search engines are special websites that are

The basics: what’s a search engine? ü Search engines are special websites that are designed to find information stored on other sites ü Most have the following capabilities: ü Search the Internet based on important words ü Keep an index of the words they find and where they were found ü Allow users to looks for words or combos of words in that index

There’s a lot of sites out there…. ü Indeed (thousands upon thousands nowadays) ü

There’s a lot of sites out there…. ü Indeed (thousands upon thousands nowadays) ü The first search engine (for Gopher) was Archie (archive without the “v”). . Later, after the rise of Gopher came… ü Veronica (Very Easy Rodent-Orientated Net-wide Index to Computerized Archives) ü Jughead (Jonzy’s Universal Gopher Hierarchy Excavation And Display)

There’s a lot of sites out there…. ü Wandex - 1993. . First search

There’s a lot of sites out there…. ü Wandex - 1993. . First search engine (for the Web) ü Web. Crawler - 1994 (let users search for any word in any page. . revoutionary! Now standard. . ) ü Lycos - 1994 (Carnegie Mellon University) ü Many others came after…. ü Excite, Infoseek, Inktomi, Northern Light, Alta. Vista, Yahoo!… ü Google came about around 2000 and rose to popularity because of it’s innovative Page. Rank system

How does it work? ü The pieces of a search engine ü A ‘spider’

How does it work? ü The pieces of a search engine ü A ‘spider’ or ‘crawler’ ü Software “robots” that go out and visit pages on the web and build lists of words that they find on each page ü An index ü The data (words) that are gathered are indexed (by a method determined by the particular search engine) ü A search ü Usually accompanied by Boolean logic

Example: Google ü Claim to fame: the Page. Rank system ü Uses multiple spiders

Example: Google ü Claim to fame: the Page. Rank system ü Uses multiple spiders (initially 3 at once) ü Spiders take note of: ü Words on the page & Where they were found ü The index consists of every “significant” word on each page ü Google excludes the articles ‘a’, ‘an’, and ‘the’ ü Each page that is indexed is weighted according to the Page. Rank System (a link analysis algorithm to provide a numerical weight) ü Searching ü When a search is performed by a user, Google retrieves from its index all of the pages that contain those keywords AND sorts them according by the assigned ‘Page. Rank’ ü Ideally the first several sites listed will match your search criteria

Example: Ask (formerly Ask. Jeeves) ü Claim to fame: the Expert. Rank algorithm (formerly

Example: Ask (formerly Ask. Jeeves) ü Claim to fame: the Expert. Rank algorithm (formerly Teoma) ü Uses multiple spiders ü Spiders take note of: ü Words on the page & Where they were found (same as Google) ü The index consists of every “significant” word on each page ü Uses link analysis like Google ü Each page is then also analyzed to determine its popularity among pages that are considered “experts” on the topic of the search. This is called subject-specific popularity. ü Searching - natural language search (or subject-specific search) ü When a search is performed by a user, Ask goes and finds the keywords in it’s index, figures out the topics (known as ‘clusters’), the experts on those topics, and then finds the most popular results among those experts ü This leads to a unique “editorial flavor” to searching (www. ask. com)

Notable others: Alta. Vista and Lycos ü The Alta. Vista search engine indexes every

Notable others: Alta. Vista and Lycos ü The Alta. Vista search engine indexes every word on the page even insignificant articles such as ‘a’, ‘an’, and ‘the. ’ ü The Lycos search engine “is said to” index around 100 of the most frequently words used on the page as well as each word in the first 20 lines of text.

So many options… ü Google is the most used search engine on the Internet

So many options… ü Google is the most used search engine on the Internet today. (Around 50% of queries go through it) ü However, there are more efficient ways to search… ü Ask. com’s subject-specific searching much better reflects the way the Web is set up (in subject specific clusters). However, because of the complexity of their algorithm, the search results produced were inferior to competitors like Google’s Page. Rank system ü Only recently has Ask began to cut into the search engine market share (way behind Google, Yahoo, and MSN) by reducing how well the keywords must match the results (reduced from 100% to about 95%) This yields more search results and puts Ask in a better position to compete for market share.

By the numbers…. ü Below: Popularity (as of 12/07) ü Right: Timeline of major

By the numbers…. ü Below: Popularity (as of 12/07) ü Right: Timeline of major launches

Search engines of the future…. ü Two types of searching: Navigational and Research Search

Search engines of the future…. ü Two types of searching: Navigational and Research Search ü Navigational search - the user uses the search engine as a tool to navigate to a particular intended document ü Research - the user provides the search engine with a phrase which is intended to denote an object about which the user is trying to gather/research information. ü Rather than use ranking algorithms such as Google's Page. Rank to predict relevancy, Semantic Search uses semantics, or the science of meaning in language to produce highly relevant search results. ü The goal is to deliver the information queried by a user rather than have a user sort through a list of loosely related keyword results.

Semantic Searching ü Contingent upon correct semantic markup - and searching over richly structured

Semantic Searching ü Contingent upon correct semantic markup - and searching over richly structured data (ie XML and RDF) ü The goal is to deliver the information queried by the user rather than have a user sort through a list of loosely related keyword results. ü Examples: www. hakia. com and www. Power. Set. com