Quad Search A novel metasearch engine http cheetah
- Slides: 26
Quad Search: A novel metasearch engine (http: //cheetah. csd. auth. gr/~lakritid) Leonidas Akritidis 1 George Voutsakelis 2 Dimitrios Katsaros 1, 2 Panayiotis Bozanis 2 1 Data Engineering Lab, Dept. of Informatics, Aristotle Univ. , Thessaloniki, Hellas 2 Computer & Communication Engineering Dept. , Univ of Thessaly, Volos, Hellas 11 th Panhellenic Conference of Informatics, Patras, Hellas, 18 -20/05/2007
Introduction Single Search Engines Metasearch Engines • Maintenance of a document database • Low Web Coverage • Medium Scalability • Paid Listings Rank Aggregation Methods Metasearch Engines KE Method Antispam Version • Effortless invocation of multiple search engines • No document database • Increased Web Coverage • Improved retrieval effectiveness
Metasearch Engines Introduction Metasearch Engines Rank Aggregation Methods KE Method Antispam Version The Metasearch Engines use the document databases that the component search engines maintain
Rank Aggregation Introduction Metasearch Engines Rank Aggregation Methods KE Method Antispam Version What is Rank Aggregation?
Rank Aggregation Methods Introduction Rank Aggregation Methods Metasearch Engines Unweighted Borda Count Rank Aggregation Spearman’s Footrule Rank Aggregation Methods Kental’s Tau Markov Chains KE Method Antispam Version
KE Method Introduction Description Metasearch Engines Each result is called candidate Rank Aggregation Each candidate receives a score (weight), according to the formula below: Rank Aggregation Methods KE Method Antispam Version • r(i): The candidate’s rank in the i-th engine • n: The number of the candidate’s appearances • m: The number of the invoked search engines • k: The length of the top-k list
Antispam Version of the KE Method Introduction Metasearch Engines Rank Aggregation Methods KE Method Antispam Version We say that a search engine has been spammed by a page when it ranks the page too highly with respect to the other pages, according to the view of a typical user We try to constrain this phenomenon by proposing the Antispam version of the KE Method which can be better described by the following pseudocode: 1. Find which items appear in most than half pages (let the number of these items be c) 2. Apply the KE Method for these items 3. Position them in results’ list, starting at rank 1 4. Apply the KE Method for the rest of the items 5. Position them in results’ list starting at rank c+1
Quad Search’s Architecture Schematic diagram of Quad Search’s Architecture Existing Engines Quad Search Web Platform Architecture User Interface Quad Bot Web Search APIs Engine Bombing Results Filtering Advanced Search Options Result Presentation Extra Features
User Interface Features Existing Engines Quad Search Web Platform Architecture User Interface Quad Bot Web Search APIs Engine Bombing Results Filtering Advanced Search Options Result Presentation Extra Features Quad Search’s User Interface is friendly and simple in order to ensure: • Short download times • Compatibility with all major browsers • Convenient usage For this reason, we avoided using: • Large graphics files • Javascript and AJAX • Flash Presentations
User Interface (Search Hints) Search Hints Existing Engines Quad Search Web Platform Architecture User Interface Quad Bot Web Search APIs Engine Bombing Results Filtering Advanced Search Options Result Presentation Extra Features We developed this part of Quad Search to provide: • Detailed information about all its features • Explanation for simple and complex operations • Many helpful examples
Quad Bot (1) Description Existing Engines Quad Search Web Platform Architecture User Interface Quad Bot Web Search APIs Engine Bombing Results Filtering Advanced Search Options Result Presentation Extra Features Quad Bot is responsible for the result retrieval. It consists of the following sub-modules: • Input Validator: It performs security checks • Query Dispatcher: It submits the query to the component search engines simultaneously • Result Collector: It embraces the engines’ responses • Result Validator: It performs multiple conversions to the collected data.
Quad Bot (2 - Architecture) Architecture Existing Engines Quad Search Web Platform Architecture User Interface Quad Bot Web Search APIs Engine Bombing Results Filtering Advanced Search Options Result Presentation Extra Features
Web Search APIs What is a Web Search API? Existing Engines Quad Search Web Platform Architecture User Interface Quad Bot Web Search APIs Engine Bombing Results Filtering Advanced Search Options Result Presentation Extra Features API stands for Application Programming Interface. It is a programming tool supplied by the manufacturer of a large scale application A Web Search API is used to retrieve results from major search engines Disadvantages • • Inaccurate results compared to the “mother” engine Queries per Day Limitation Registration IDs required Queries per Registration ID Limitation Quad Search does not make use of Search APIs
Engine Bombing Definition Existing Engines Quad Search Web Platform Architecture User Interface Quad Bot Web Search APIs Engine Bombing Results Filtering Advanced Search Options Result Presentation Extra Features Engine Bombing occurs when multiple results from the same domain enter the presented results’ list Many metasearch engines suffer the engine bombing. Engine Bombing Protection Quad Search supports a feature to limit the different results coming from same domain
Results Filtering Provided Filters Existing Engines Quad Search Web Platform Architecture User Interface Quad Bot Web Search APIs Engine Bombing Results Filtering Advanced Search Options Result Presentation Extra Features • Antispam Filter: Application of the antispam version of the KE Method • Ranking Algorithm Selector: Quad Search provides an option to determine how the collected results will be ranked • Engine Bombing Protection
Advanced Web Search Advanced Search Filters Existing Engines Quad Search Web Platform Architecture User Interface Quad Bot Web Search APIs Engine Bombing Results Filtering Advanced Search Options Result Presentation Extra Features • File Type Selector: The user can perform searches for files of specific format (PDF, DOC, XLS and PPT) • Language Filter: Quad Search can return documents written in a specifed language • Domain Filter: The user can search a given domain, or exclude a domain from a search • Date Filter: Return results updated in the past 3, 6, or 12 months
Web Search Options Existing Engines Quad Search Web Platform Architecture User Interface Quad Bot Web Search APIs Engine Bombing Results Filtering Advanced Search Options Result Presentation Extra Features Quad Search provides the user with the ability to set options that will be used in future searches Some of these options are: 1. Connection Timeout Feature. How long Quad Search should wait a search engine to respond 2. Determine the number of candidates to be collected per component engine 3. Determine the number of results to be displayed per result page 4. Determine whether the results will be opened in a new browser window
Results Presentation (1) Classic View: The results are displayed in the classic way Existing Engines Quad Search Web Platform Architecture User Interface Quad Bot Web Search APIs Engine Bombing Results Filtering Advanced Search Options Result Presentation Extra Features Array View: The results are displayed in a ranked array. The user can watch the results and their rankings easier
Results Presentation (2) Results Page Existing Engines Quad Search Web Platform Architecture User Interface Quad Bot Web Search APIs Engine Bombing Results Filtering Advanced Search Options Result Presentation Extra Features The results page is highly customizable. A relative screenshot is depicted below
Scientific Search General Features Scientific Search Related Work H-Index Search Options Advanced Search Cache Extra Features Quad Search is capable of searching for scientists, authors and/or published articles Google Scholar provides the required data Quad Search collects the data and produces statistics and charts
H-Index Definition Scientific Search Related Work H-Index Search Options Advanced Search Cache Extra Features The h-index is an index for quantifying the scientific productivity of physicists and other scientists based on their publication record A scientist has index h if h of his Np papers have at least h citations each, and the other (Np - h) papers have no more than h citations each Quad Search computes h-index when the user does a search for authors
Scientific Search Options Scientific Search Related Work H-Index Search Options Advanced Search Cache Extra Features The scientific search part of Quad Search offers a variety of options that can be stored and used in future searches The user can define • The results’ language • The results’ subject area (biology, chemistry, physics, engineering, medicine etc) • The number of results to be displayed per page • If the results will be opened in the current or in a new window
Extra Features - Charts Scientific Search Related Work H-Index Search Options Advanced Search Cache Extra Features The user can visually check the number of cites per paper of a specified author. This feature is applicable for “Author Searches”
Extra Features – Excluding Papers Scientific Search Related Work H-Index Search Options Advanced Search Cache Extra Features When a user performs an “Author Search”, Quad Search transfers all results from Google Scholar (or its cache) Possibly, some of these articles should not participate in the calculations (e. g. the h-index) The user can exclude the papers that should not participate in the calculations, by deselecting the appropriate checkbox
Future Work Our plans for Quad Search Future Work Concluding remarks • Support for extra ranking algorithms (e. g. Markov chains) • Geography aware search for News • News Search with RSS feeds • Wide Personalization (users, profiles, topics of interest, stored multimedia and user defined customization) • Image and Video searches • Searches in P 2 P networks (e-donkey, g-nutella, etc) • Torrent Searches
Concluding Remarks Conclusions Future Work Concluding remarks • In this session, we presented a pair of rank aggregation algorithms, KE Method and its antispam version • We injected some new parameters like the number of the top-k lists that a page appears and the total number of the exploited search engines • We also presented a novel meta-search engine, Quad Search • Quad Search offers a wide variety of new features for web search, like the ranking algorithm selector, the engine bombing protection etc • Quad Search also provides options for searches for scientific articles. It also computes statistics like hindex
- Cheetah's cheetah life cycle
- Sebutkan search engine
- Search photo
- Yahoo shopping tw
- Internal combustion vs external combustion
- What's a search engine
- Asi esp promotional products
- Goto search engine
- The anatomy of a large scale hypertextual web search engine
- Oogoogle translate
- Difference between web browser and search engine
- Sequence diagram for atm system pdf
- What are the four components of a search engine
- Anatomy of a search engine
- Meta search engines
- Trellian keyword discovery tool
- Search engine adult
- Components of search engine in information retrieval
- Scirus search engine
- Personalised mobile search engine
- Sequence diagram for search engine
- Dot search
- Google scholarly search engine
- Vista search engine
- Architecture search engine
- Notmuch search
- Indri search engine