Neuro Search A Specialised Search Engine for Neuroscience

  • Slides: 26
Download presentation
Neuro. Search A Specialised Search Engine for Neuroscience Web. Pages Fatma Y. ELDRESI (MPhil

Neuro. Search A Specialised Search Engine for Neuroscience Web. Pages Fatma Y. ELDRESI (MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis, Fatmaeldresi@hotmail. com

Contents Introduction Components in a Neuro. Search & its Architecture Implementation Software lifecycle :

Contents Introduction Components in a Neuro. Search & its Architecture Implementation Software lifecycle : (1)web. Crawler Engine, (2) Indexer Engine, (3) Query Engine, (4) Re-Crawler Engine (Specialised Crawler) Challenges Testing Conclusions 2

Introduction What is a Search Engine? A server or a collection of servers dedicated

Introduction What is a Search Engine? A server or a collection of servers dedicated to indexing internet web pages, storing the results and returning lists of pages which match particular queries. Convenient search engines generate indexes : • Google using Spider • Yahoo using Directory ü“Neuro. Search” Using Spider & the Advance Knowledge 3

Introduction cont. . why is a specialised search engine needed? Defining the problem Ø

Introduction cont. . why is a specialised search engine needed? Defining the problem Ø Ø Ø Web has got non centralised organisation, with huge mixed collection of Information Updated continuously, without standard format, Pages are extensively linked Therefore, establishing standard measures for relevance is a very challenging task In addition, (1)- users have many challenges in choosing the relevant keywords; (2)- professionals sometimes fail in their search and get disappointed result, because A. the retrieved pages sometimes not related or B. different from what they’re looking for. The Objective ØCreating a specialised search engine (i. e, Advance knowledge) to read web documents ØIndex and update all the content in the local server ØAnswer the queries from the local database ØUpdate the system over a constant period 4

Components of “Neuro. Search” It has two components: 1 -Search/Crawler Engine 2 - Query

Components of “Neuro. Search” It has two components: 1 -Search/Crawler Engine 2 - Query engines 5

Components explained Query Engine Retriever (Query engine) Crawler Engine Re-crawler Crawler Engine Indexer Crawler

Components explained Query Engine Retriever (Query engine) Crawler Engine Re-crawler Crawler Engine Indexer Crawler Engine Spider 6

“Neuro. Search” Architecture Model WWW Search Engine Interface Query Engine Users Indexer Re-Crawler Web.

“Neuro. Search” Architecture Model WWW Search Engine Interface Query Engine Users Indexer Re-Crawler Web. Crawler World Wide Web 7

Implementation and Case Study • Creating the database using Access DB. • Implementing all

Implementation and Case Study • Creating the database using Access DB. • Implementing all parts of “Nuero. Search” using Java Language and SQL. 8

Neuro. Search Database Advance Knowledge data Web. Crawler data The Advance Knowledge TEXT Re-crawler

Neuro. Search Database Advance Knowledge data Web. Crawler data The Advance Knowledge TEXT Re-crawler data Query TEXT Data TEXT Indexer data 9

The advance knowledge Case study- Neuroscience (Vision) Phase 3 This knowledge is stored in

The advance knowledge Case study- Neuroscience (Vision) Phase 3 This knowledge is stored in the database and categorised by numbers, and related knowledge is categorised too and stored in data network form in the database. Phase 2 Then, as a domain knowledge of Vision, do data mining to construct keywords and the relation between them. Phase 1 Neuro. Search uses advance knowledge about Neuroscience (vision) as a case study. 10

Software lifecycle Crawler Engine Consists of 1. Web. Crawler/Spider Engine 2. Indexer Engine 3.

Software lifecycle Crawler Engine Consists of 1. Web. Crawler/Spider Engine 2. Indexer Engine 3. Re-Crawler (specialised) 11

Web. Crawler (Spider) 1)-This web crawler is general one which can download any kind

Web. Crawler (Spider) 1)-This web crawler is general one which can download any kind of Web. Pages. It performs this using : 2)-Fetch URL, retrieves all its Web. Pages and saves them in the local drive Spider 3)-In addition, Web. Crawler has to access the proxy firewall (i. e. in Newcastle University LAN), before downloaded any web sites. 4)-The crawler performs a breadth-first search, search which means it collects a list of all the links that are on the current page before it follows any of the links to a new page. 12

Web. Crawler - real challenge. Challenge 1: Challenge 2: connect to www and accessing

Web. Crawler - real challenge. Challenge 1: Challenge 2: connect to www and accessing private websites. Solution 1: Crawler has to allow its socket to connect first with the Proxy server. connect this socket further to the WWW Solution 2: Get method : the straight forward socket uses is just to get the file name. However, in this case Get command has to take the full URL. 13

Indexer Engine 1)-Firstly, it search the webpage using it’s advance knowledge. Then, Webpage will

Indexer Engine 1)-Firstly, it search the webpage using it’s advance knowledge. Then, Webpage will be deleted if it is not related to the case study subject. Indexer Engine 3)-All keywords it contains, how many times they are repeated, title, contents Then, save them in the database for later display in the query result and do other calculation. 2)- if it is related to the case study subject (neuroscience) so the indexer will collect the following information from the document: 4)-The Ranking Method 14

Query Engine It has an interface to accept keywords from the user Query Engine

Query Engine It has an interface to accept keywords from the user Query Engine gives the user 2 choices for either display only the most relevant result, or the whole result which include the related results. It searches for query keywords in the index database and retrieved the result in html format. 15

Query Result: This is indeed an edge compared to other convenient search engines 16

Query Result: This is indeed an edge compared to other convenient search engines 16

Re-Crawling 2 -its interface allow the special users decide to continue crawling the website

Re-Crawling 2 -its interface allow the special users decide to continue crawling the website or Recancel it. Crawling 1 -Web. Crawler is specialised of any subject created in the advance knowledge in the database, which will achieve this purpose by reading the URL from the index database using SQL 3 -This Part of software aimed to update the index found new link. This is will make search and crawl any “advance knowledge” subject related websites easier 17

Testing phase Test phase requires: checking the first 10 ranking queries results of the

Testing phase Test phase requires: checking the first 10 ranking queries results of the “Neuro. Search” with the same 10 queries results of another search engine such as Google. abbreviation & combined keywords general keywords specific keywords 20 tests for each category Abbreviation keywords Total of 1000 tests 18

Testing cont. . Ranking query test results in General Keywords: Search Engine First 10

Testing cont. . Ranking query test results in General Keywords: Search Engine First 10 results Google Rank Keyword Neuro. Search Engine Repeated Rank Keyword repeated Relatedkeyword repeated Quality/perce ntage 1 0 0 0 10 1 3 53 3 37% 2 10 1 3 51 3 27% 3 0 0 0 10 1 3 37 3 36% 4 0 0 0 10 1 3 37 3 33. 6% 5 0 0 0 10 1 3 34 3 36. 7% 6 0 0 0 10 1 3 29 3 38. 4% 7 0 0 0 10 1 3 28 3 38. 1% 8 0 0 0 10 1 3 28 3 38% 9 0 0 0 10 1 3 28 3 24. 9% 10 0 10 1 3 28 3 13. 8% Average % 10% 100% Table 1: (Query 1) Ranking query test result in General Keywords: (Eye) 19

Testing cont. . Chart 1 Average of Keywords performance for Category Based test results

Testing cont. . Chart 1 Average of Keywords performance for Category Based test results of the (Google) Chart 2 Average of Keywords performance for Category Based test results of the (Neuro. Search) 20

Analysing the search engines ranking results Depends on the Categories Table 4. The Average

Analysing the search engines ranking results Depends on the Categories Table 4. The Average Ranking Engines Performance Query test results Category based 21

Analysing the Average Ranking Engines Performance Query test results Category based t test Result

Analysing the Average Ranking Engines Performance Query test results Category based t test Result analysis p value <. 05). That indicates, Neuro. Search have a is used to compare two statistically significantly higher groups' scores mean score in all on the same categories ranking variable results (100) than Google (52. 35) Result analysis. . the negative values of t-test show the (inverse) relation between them when Neuro. Search results increase the Google results decrease. 22

Visual representation Chart 3 Average of Categories Based Engines ranking performance Chart 4 Average

Visual representation Chart 3 Average of Categories Based Engines ranking performance Chart 4 Average of the keyword Based in the documents in Query test results for (Category based Query) engines performance 23

Conclusion Although “Neuro. Search” search engine Used a simple algorithm to judge the page

Conclusion Although “Neuro. Search” search engine Used a simple algorithm to judge the page quality compared by other convenient search engines, “Neuro. Search” proves to be very Particularly, if its advance knowledge built/created by specialist (domain knowledge), e. g. Oil, Medical, arts, etc powerful in obtaining relevant results, 24

Reference (example. . ) 4 : Wandell, Brain A. Foundations of Vision. Sunderland, Massachusetts,

Reference (example. . ) 4 : Wandell, Brain A. Foundations of Vision. Sunderland, Massachusetts, USA, 1995. 4 Brin, S. and L. Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. The Seventh Annual International WWW Conference and computing science of Stanford University, Stanford, CA 94305. USA, 1998. 25

Ready for Questions!!! 26

Ready for Questions!!! 26