Internet Research Search Engines Subject Directories Search engines

  • Slides: 15
Download presentation
Internet Research Search Engines & Subject Directories

Internet Research Search Engines & Subject Directories

Search engines • Search engines are the means by which most people search the

Search engines • Search engines are the means by which most people search the Web. • Common examples are Google, Altavista, Direct Hit.

Yet they don’t search the Internet • Yet a search engine does not actually

Yet they don’t search the Internet • Yet a search engine does not actually search the Web during your search. • A search engine searches itself. • It’s a three-step process.

1) Bots index words • Search engines continually send out hundreds of “robots” or

1) Bots index words • Search engines continually send out hundreds of “robots” or “bots” (or “spiders” or “crawlers” ) • Bots visit web sites, read word by word, and then index those words.

2) A database is created • A huge database of Web sites thus is

2) A database is created • A huge database of Web sites thus is gathered and indexed by word. • These databases can be huge, with millions of links.

3) The Interface gives you access • Using the keywords you give it, a

3) The Interface gives you access • Using the keywords you give it, a search engine then searches its own current index.

Interfaces are based on rankings • Search engines return results based on a ranking

Interfaces are based on rankings • Search engines return results based on a ranking system. • Ranking is the order that files are listed when they are retrieved.

The ranking system is secret • These systems are proprietary and often “secret. ”

The ranking system is secret • These systems are proprietary and often “secret. ” In general: • Altavista ranks web pages higher if your search terms are found in the first few words of the page • Google ranks by document “popularity” with other similar searches • Direct Hit ranks by the length of time other users spent at the site

Not even half the Web • With all of this software and sophistication, even

Not even half the Web • With all of this software and sophistication, even the best search engines cover only 40 -50% of the Web. • And they miss much else on the Internet.

Bots hit and miss Bots miss: n n n XML pages, pdf files Dynamically

Bots hit and miss Bots miss: n n n XML pages, pdf files Dynamically created HTML pages Frames-based pages New pages or recent updated text Some say the Invisible Web is 500 times larger than Web

Subject Directories • A subject directory is also a database of web sites and

Subject Directories • A subject directory is also a database of web sites and references. • But a subject directory is organized not by keywords but by category or subject.

Yahoo! • Yahoo! Is the most popular subject directory. • www. about. com takes

Yahoo! • Yahoo! Is the most popular subject directory. • www. about. com takes the idea a step further with subject guides for selected topics.

Subjects are organized by people. • Information is selected, organized and cataloged by a

Subjects are organized by people. • Information is selected, organized and cataloged by a person, not software. • You can usually be more assured that the search results will make sense.

You get an index of sites. • Subject directories will not often provide you

You get an index of sites. • Subject directories will not often provide you with ranked web sites. • Instead, you will get a broad index related to your topic, divided further by subheadings.

Use for early searching. • Use a subject directory early in your search process

Use for early searching. • Use a subject directory early in your search process to learn about your subject. • You will get fewer links of higher quality. • When you get more specific questions, you should use a search engine.