INFORMATION RETRIEVAL UNITIII WEB SEARCH ENGINE Web search
INFORMATION RETRIEVAL UNIT-III WEB SEARCH ENGINE Web search overview Web structure User Problems Paid Placement SEO
SEARCH ENGINE Extract information for the user Searches the Web Search Engine Tool Software System Uses the keywords to search WWW
TYPES Crawler Based Survey and categorize web pages Spiders, Crawlers, Robots or Bots. Identify the seed (url of home page) Crawling Indexing Calculating Relevancy Retrieving the Result Ex: Google. Bing Yahoo!Baidu Duck. Go, AOL and Ask. Yandex
TYPES Human powered directories also referred as open directory system depends on human based activities for listings. Site owner submits a short description of the site to the directory along with category it is to be listed. Submitted site is then manually reviewed and added in the appropriate category or rejected for listing. Keywords entered in a search box will be matched with the description of the sites. Ex: Yahoo! Directory and DMOZ
TYPES Hybrid Search Engine Combination of both crawler-based results and directory results Meta Search Engines metasearch engines search multiple search engines and aggregate the results take the results from SE combine them into one large listing Meta crawler
WEB STRUCTURE Website : Own structure/Layout/Template Web Site linking various pages with hyperlinks Links are URL string Bow-tie structure of Web Strongly Connected Component SCC a user can navigate from one of them to the other and back by clicking on links embedded in the pages encountered
WEB STRUCTURE IN, contains pages that have a directed path of links leading to the SCC left bow might be either new pages that have not yet been linked to, or older web pages that have not become popular enough to become part of the SCC. The right bow, called OUT, contains pages that can be reached from the SCC by following a directed path of links
WEB STRUCTURE A web page in Tubes has a directed path from IN to OUT bypassing the SCC A page in Tendrils can either be reached from IN or leads into OUT The pages in disconnected are not even weakly connected to the SCC
USER : WEB SEARCH User Interface Can interact with Search Engine thro UI how to provide a sequence of words for the search. Not aware of input requirement of the search engine Ex: case sensitive 85% of users only look at the first page of the result Solution Better User Interface Should be provided by Search Engine
Searching Guidelines Specify the words clearly, page title, date, and country If looking for a company, institution, or organization, try to guess the URL by using the www prefix followed by the name, and then (. com, . edu, . org, . gov, or country code). Use www. researchindex. com to search research papers. Ex: IEEE explore For broad queries : Web directories
PAID PLACEMENT Online advertising Done via links to products Links appear next to keyword search results Internet advertising Paid placement advertising Sellers bid payments to a search engine A group of advertisers who bid more than the rest are selected for placement Positions of placemen : order of bids Higher Bid : Top
Search engine separates its query results list into two parts: (i) an organic list free unbiased results search engine‘s ranking algorithm (ii) a sponsored list, is paid for by advertising. pay per click (PPC), cost per click (CPC), payment is made by the advertiser each time a user clicks on the link in the sponsored listing.
ORGANIC SEARCH VS PAID SEARCH the list of links that appear below the ads are known as "organic results. “ Paid search accounts are those that companies have paid to appear the top of search results
SEARCH ENGINE OPTIMIZATION SEO set of activities Used to increase number of desirable visitors to our site via Search Engine activities may include changes to our text and HTML code formatting text or document
SEO Search Engine Optimization is the process of improving the visibility of a website on organic ("natural" or unpaid) search engine result pages (SERPs), SEO : Two Areas On-page optimization off-page optimization
On-Page Factors Title tags<title> Header tags<h 1> ALT image tags Content, (Body text)<body> Hyperlink text Keyword frequency and density Off-Page Factors Anchor text Link Popularity (―votes for your site) – adds credibility
ORGANIC SEARCH LISTING Crawler : Collects the Listings SE‘s index: Stores the copy of web Pages SE : Indexes the Pages based on data SEO Techniques: White Hat SEO Black Hat SEO
WHITE HAT SEO Ethical SEO or Organic SEO. Ranking without breaking the rules Follows SE guidlines Offering quality content and services Keywords analysis BLACK HAT SEO Unethical
- Slides: 29