Search Engines What Are They Four Components A
- Slides: 18
Search Engines
What Are They? ÷ Four Components A database of references to webpages ¯ An indexing robot that crawls the WWW ¯ An interface ¯ Enables users to submit queries ° Displays results ° ¯ ÷ Information retrieval system Each is unique, but are mostly the same 2
Database ÷ ÷ Where user's query is matched Contains only essential parts of pages Only includes pages that were indexed Search engines are always out of date 3
Web Crawler ÷ ÷ A robot that follows links Records data it finds Words in the webpage ¯ Metadata ¯ ¯ ÷ ALT attributes in IMG tags Robot Exclusion Protocol 4
Search Engine Interfaces ÷ ÷ Gathers input from users Presents results from the IR system ¯ Often in ranked order 5
Search Engine Interfaces ÷ Input ¯ User requirements ° ¯ Search expression, search limits Presentation style ° Presentation format , search type 6
Search Engine Interfaces ÷ Output Results ¯ Descriptions ¯ Clusters ¯ 7
Search Term Matching ÷ ÷ Trying to find a match in the database Two main methods ¯ Keyword searching ° ¯ Matching single terms, computing cosine Concept-based searching Examining clusters of words ° Attempt to determine meaning of query and find records related to that meaning ° 8
Basic IR Features ÷ Boolean operators ¯ ÷ Extended operators ¯ ÷ ÷ ÷ AND, OR, NOT, grouping NEAR, ADJACENT, (") Stop word deletion Stemming Searching in fields (e. g. host) 9
Ranked Output ÷ Most SEs produce ranked lists by applying simple rules: ¯ ¯ ¯ ÷ Early words are more important Title is very important Frequency of occurrence matters for some Infrequent words matter more Modification date Google is different: ¯ ¯ Page. Rank. TM method based on popularity Links as money 10
Googlebombing ÷ Google spoofed from the lecture list first hit from 1992 ¯ Official Google. Blog explanation ¯ 11
What about the Invisible Web? ÷ ÷ Also known as the Deep Web Documents that are on the WWW but not indexed by Search Engines Some are available only by submitting forms ¯ Some are not generally accessible (in subnets) ¯ Some are not in (X)HTML format ¯ 12
The Invisible Web Isn't So Invisible Anymore… ÷ ÷ More search engines parse non(X)HTML now than before Because of awareness of the problem companies are making more content available using Stable URLs ¯ Robot-friendly sitemaps ¯ ÷ But much content is still not indexed 13
But, there's still plenty of important yet invisible docs ÷ How to find them? ¯ ¯ ÷ Use database tools from the U. 's library ¯ ÷ Many of them are in databases No one search engine covers everything Especially for research articles Use multiple search engines or a metacrawler ¯ dogpile is the most famous 14
Search Engines A Summary of Practical Advice
How To Succeed With SEs ÷ As a surfer: ¯ If you don't know what you are looking for Use multiple SEs, or a meta-crawler ° Search within results ° ¯ If you don't know what you are looking for Use multiple SEs, or a meta-crawler ° Use Boolean expressions or search within results ° Consider specialized engines ° 16
How To Succeed With SEs ÷ As a creator: ¯ HTML level ° ° ¯ Always use ALT attributes with <IMG>, etc. Avoid frames Make it easier to index ° ° ° Don't expect SEs to find your pages Make links between your pages Use metadata ³ ³ ¯ Informal: <meta name="description" …> Formal: Dublin core and others Increase your pages popularity ° ° Don’t use systematic reciprocal linking: rings, exchanges, lists Page Rank™ is inversely proportional to outdegree 17
How To Succeed With SEs ÷ ÷ As a creator (cont. ) For surfers: Use <meta name="description" …> ¯ Don't expect surfers to start at top of your hierarchy ¯ Don't rely on a hierarchy ° Include a context map near the top of each page ° Don't use frames ° Think through dynamic content implications ° Stickiness… is for another day ° 18
- Antigentest åre
- Meta search engines
- Search engines
- Meta search engine definition
- Open source search engines
- Search engines architecture
- Other search engines
- Information retrieval slides
- Architecture search engines
- Search engines information retrieval in practice
- Www.sbu
- Search engines information retrieval in practice
- Search engines information retrieval in practice
- Chapter 5 two-cycle and four-cycle engines answers
- What are the four components of a search engine
- Troubleshooting small engines
- Medieval war machines
- Light vehicle diesel engines
- Fuel saver plus