Search Engines What Are They Four Components A
![Search Engines Search Engines](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-1.jpg)
![What Are They? ÷ Four Components A database of references to webpages ¯ An What Are They? ÷ Four Components A database of references to webpages ¯ An](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-2.jpg)
![Database ÷ ÷ Where user's query is matched Contains only essential parts of pages Database ÷ ÷ Where user's query is matched Contains only essential parts of pages](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-3.jpg)
![Web Crawler ÷ ÷ A robot that follows links Records data it finds Words Web Crawler ÷ ÷ A robot that follows links Records data it finds Words](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-4.jpg)
![Search Engine Interfaces ÷ ÷ Gathers input from users Presents results from the IR Search Engine Interfaces ÷ ÷ Gathers input from users Presents results from the IR](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-5.jpg)
![Search Engine Interfaces ÷ Input ¯ User requirements ° ¯ Search expression, search limits Search Engine Interfaces ÷ Input ¯ User requirements ° ¯ Search expression, search limits](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-6.jpg)
![Search Engine Interfaces ÷ Output Results ¯ Descriptions ¯ Clusters ¯ 7 Search Engine Interfaces ÷ Output Results ¯ Descriptions ¯ Clusters ¯ 7](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-7.jpg)
![Search Term Matching ÷ ÷ Trying to find a match in the database Two Search Term Matching ÷ ÷ Trying to find a match in the database Two](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-8.jpg)
![Basic IR Features ÷ Boolean operators ¯ ÷ Extended operators ¯ ÷ ÷ ÷ Basic IR Features ÷ Boolean operators ¯ ÷ Extended operators ¯ ÷ ÷ ÷](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-9.jpg)
![Ranked Output ÷ Most SEs produce ranked lists by applying simple rules: ¯ ¯ Ranked Output ÷ Most SEs produce ranked lists by applying simple rules: ¯ ¯](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-10.jpg)
![Googlebombing ÷ Google spoofed from the lecture list first hit from 1992 ¯ Official Googlebombing ÷ Google spoofed from the lecture list first hit from 1992 ¯ Official](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-11.jpg)
![What about the Invisible Web? ÷ ÷ Also known as the Deep Web Documents What about the Invisible Web? ÷ ÷ Also known as the Deep Web Documents](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-12.jpg)
![The Invisible Web Isn't So Invisible Anymore… ÷ ÷ More search engines parse non(X)HTML The Invisible Web Isn't So Invisible Anymore… ÷ ÷ More search engines parse non(X)HTML](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-13.jpg)
![But, there's still plenty of important yet invisible docs ÷ How to find them? But, there's still plenty of important yet invisible docs ÷ How to find them?](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-14.jpg)
![Search Engines A Summary of Practical Advice Search Engines A Summary of Practical Advice](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-15.jpg)
![How To Succeed With SEs ÷ As a surfer: ¯ If you don't know How To Succeed With SEs ÷ As a surfer: ¯ If you don't know](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-16.jpg)
![How To Succeed With SEs ÷ As a creator: ¯ HTML level ° ° How To Succeed With SEs ÷ As a creator: ¯ HTML level ° °](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-17.jpg)
![How To Succeed With SEs ÷ ÷ As a creator (cont. ) For surfers: How To Succeed With SEs ÷ ÷ As a creator (cont. ) For surfers:](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-18.jpg)
- Slides: 18
![Search Engines Search Engines](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-1.jpg)
Search Engines
![What Are They Four Components A database of references to webpages An What Are They? ÷ Four Components A database of references to webpages ¯ An](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-2.jpg)
What Are They? ÷ Four Components A database of references to webpages ¯ An indexing robot that crawls the WWW ¯ An interface ¯ Enables users to submit queries ° Displays results ° ¯ ÷ Information retrieval system Each is unique, but are mostly the same 2
![Database Where users query is matched Contains only essential parts of pages Database ÷ ÷ Where user's query is matched Contains only essential parts of pages](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-3.jpg)
Database ÷ ÷ Where user's query is matched Contains only essential parts of pages Only includes pages that were indexed Search engines are always out of date 3
![Web Crawler A robot that follows links Records data it finds Words Web Crawler ÷ ÷ A robot that follows links Records data it finds Words](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-4.jpg)
Web Crawler ÷ ÷ A robot that follows links Records data it finds Words in the webpage ¯ Metadata ¯ ¯ ÷ ALT attributes in IMG tags Robot Exclusion Protocol 4
![Search Engine Interfaces Gathers input from users Presents results from the IR Search Engine Interfaces ÷ ÷ Gathers input from users Presents results from the IR](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-5.jpg)
Search Engine Interfaces ÷ ÷ Gathers input from users Presents results from the IR system ¯ Often in ranked order 5
![Search Engine Interfaces Input User requirements Search expression search limits Search Engine Interfaces ÷ Input ¯ User requirements ° ¯ Search expression, search limits](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-6.jpg)
Search Engine Interfaces ÷ Input ¯ User requirements ° ¯ Search expression, search limits Presentation style ° Presentation format , search type 6
![Search Engine Interfaces Output Results Descriptions Clusters 7 Search Engine Interfaces ÷ Output Results ¯ Descriptions ¯ Clusters ¯ 7](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-7.jpg)
Search Engine Interfaces ÷ Output Results ¯ Descriptions ¯ Clusters ¯ 7
![Search Term Matching Trying to find a match in the database Two Search Term Matching ÷ ÷ Trying to find a match in the database Two](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-8.jpg)
Search Term Matching ÷ ÷ Trying to find a match in the database Two main methods ¯ Keyword searching ° ¯ Matching single terms, computing cosine Concept-based searching Examining clusters of words ° Attempt to determine meaning of query and find records related to that meaning ° 8
![Basic IR Features Boolean operators Extended operators Basic IR Features ÷ Boolean operators ¯ ÷ Extended operators ¯ ÷ ÷ ÷](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-9.jpg)
Basic IR Features ÷ Boolean operators ¯ ÷ Extended operators ¯ ÷ ÷ ÷ AND, OR, NOT, grouping NEAR, ADJACENT, (") Stop word deletion Stemming Searching in fields (e. g. host) 9
![Ranked Output Most SEs produce ranked lists by applying simple rules Ranked Output ÷ Most SEs produce ranked lists by applying simple rules: ¯ ¯](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-10.jpg)
Ranked Output ÷ Most SEs produce ranked lists by applying simple rules: ¯ ¯ ¯ ÷ Early words are more important Title is very important Frequency of occurrence matters for some Infrequent words matter more Modification date Google is different: ¯ ¯ Page. Rank. TM method based on popularity Links as money 10
![Googlebombing Google spoofed from the lecture list first hit from 1992 Official Googlebombing ÷ Google spoofed from the lecture list first hit from 1992 ¯ Official](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-11.jpg)
Googlebombing ÷ Google spoofed from the lecture list first hit from 1992 ¯ Official Google. Blog explanation ¯ 11
![What about the Invisible Web Also known as the Deep Web Documents What about the Invisible Web? ÷ ÷ Also known as the Deep Web Documents](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-12.jpg)
What about the Invisible Web? ÷ ÷ Also known as the Deep Web Documents that are on the WWW but not indexed by Search Engines Some are available only by submitting forms ¯ Some are not generally accessible (in subnets) ¯ Some are not in (X)HTML format ¯ 12
![The Invisible Web Isnt So Invisible Anymore More search engines parse nonXHTML The Invisible Web Isn't So Invisible Anymore… ÷ ÷ More search engines parse non(X)HTML](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-13.jpg)
The Invisible Web Isn't So Invisible Anymore… ÷ ÷ More search engines parse non(X)HTML now than before Because of awareness of the problem companies are making more content available using Stable URLs ¯ Robot-friendly sitemaps ¯ ÷ But much content is still not indexed 13
![But theres still plenty of important yet invisible docs How to find them But, there's still plenty of important yet invisible docs ÷ How to find them?](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-14.jpg)
But, there's still plenty of important yet invisible docs ÷ How to find them? ¯ ¯ ÷ Use database tools from the U. 's library ¯ ÷ Many of them are in databases No one search engine covers everything Especially for research articles Use multiple search engines or a metacrawler ¯ dogpile is the most famous 14
![Search Engines A Summary of Practical Advice Search Engines A Summary of Practical Advice](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-15.jpg)
Search Engines A Summary of Practical Advice
![How To Succeed With SEs As a surfer If you dont know How To Succeed With SEs ÷ As a surfer: ¯ If you don't know](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-16.jpg)
How To Succeed With SEs ÷ As a surfer: ¯ If you don't know what you are looking for Use multiple SEs, or a meta-crawler ° Search within results ° ¯ If you don't know what you are looking for Use multiple SEs, or a meta-crawler ° Use Boolean expressions or search within results ° Consider specialized engines ° 16
![How To Succeed With SEs As a creator HTML level How To Succeed With SEs ÷ As a creator: ¯ HTML level ° °](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-17.jpg)
How To Succeed With SEs ÷ As a creator: ¯ HTML level ° ° ¯ Always use ALT attributes with <IMG>, etc. Avoid frames Make it easier to index ° ° ° Don't expect SEs to find your pages Make links between your pages Use metadata ³ ³ ¯ Informal: <meta name="description" …> Formal: Dublin core and others Increase your pages popularity ° ° Don’t use systematic reciprocal linking: rings, exchanges, lists Page Rank™ is inversely proportional to outdegree 17
![How To Succeed With SEs As a creator cont For surfers How To Succeed With SEs ÷ ÷ As a creator (cont. ) For surfers:](https://slidetodoc.com/presentation_image_h/e4bf1553571407bfa9de7484049ee438/image-18.jpg)
How To Succeed With SEs ÷ ÷ As a creator (cont. ) For surfers: Use <meta name="description" …> ¯ Don't expect surfers to start at top of your hierarchy ¯ Don't rely on a hierarchy ° Include a context map near the top of each page ° Don't use frames ° Think through dynamic content implications ° Stickiness… is for another day ° 18
Antigentest åre
Meta search engines
Search engines
Meta search engine definition
Open source search engines
Search engines architecture
Other search engines
Information retrieval slides
Architecture search engines
Search engines information retrieval in practice
Www.sbu
Search engines information retrieval in practice
Search engines information retrieval in practice
Chapter 5 two-cycle and four-cycle engines answers
What are the four components of a search engine
Troubleshooting small engines
Medieval war machines
Light vehicle diesel engines
Fuel saver plus