Search Engines Introduction Search Engine Overview 1 Query

  • Slides: 13
Download presentation
Search Engines Introduction

Search Engines Introduction

Search Engine Overview 1 Query (질의) 0 Search Results User n What am I

Search Engine Overview 1 Query (질의) 0 Search Results User n What am I looking for? - Identification of info. Need n What question do I ask? - Query formulation Search Data (0) (1) Query Indexing (2) Document Ranking (3) Result Display 1. Document Collection - e. g. , spider/crawler 2. Document Indexing - term indexing (tokenizing, stop & stem) - term weighting Intermediary n What is the searcher looking for? - Discovery of user’s info. need n How should the question be posed? - Query representation n Where is the relevant information? - Query-document matching Search Engines 2 3 Searchable Index (색인) Information n What data to collect? - Collection development What information to index? - Indexing/Representation n How to represent it? - Data structure n 2

Search Engine: Data n n Document Collection u Select target data sources – e.

Search Engine: Data n n Document Collection u Select target data sources – e. g. , domain, corpus, WWW u Harvest data – e. g. , data entry, data import, spider/crawler Document Indexing u Select indexing sources (색인어) – e. g. , metadata, keywords, content u Extract indexing terms – e. g. , tokenization, stop & stem u Assign term weights – e. g. , tf-idf, okapi “The frequency of word occurrence in an article furnishes a useful measurement of word significance. ” - 문헌에 출현한 던어들은 문헌의 내용 분석을 위해 사용될 수 있으며, 단어의 출 현빈도가 이 단어의 주제어로서의 중요성을 측정하는 기준이 된다. Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Development, 2, 159 -165. Search Engines 3

Search Engine: Indexing Process Documents (Text) INVERTED INDEX 0 5 Term Weighting Tokenization 1

Search Engine: Indexing Process Documents (Text) INVERTED INDEX 0 5 Term Weighting Tokenization 1 Token Selection Tokens 2 Tokens Select Tokens 0 Tokens SEQUENTIAL Tokens INDEX Token Normalization 4 2 3 D 1: information, retrieval, seminar(s) D 2: retrieval, model(s), and, information, retrieval D 3: information, model Search Engines 3 5 D 1: Information retrieval seminars D 2: Retrieval Models and Information Retrieval D 3: Information Model 1 4 D 1 D 2 D 3 wd 1 (information) 1 1 1 D 1 information 1, retrieval 1, seminar 1 wd 2 (model) 0 1 1 D 2 information 1, model 1, retrieval 2 wd 3 (retrieval) 1 2 0 D 3 information 1, model 1 wd 4 (seminar) 1 0 0 4

Search Engine: Search n Query Indexing u u u n Tokenization Stop & Stem

Search Engine: Search n Query Indexing u u u n Tokenization Stop & Stem Term Weighting Document Ranking u u Query-Document matching Document Score computation Query: What is information retrieval? Q: Information 1, retrieval 1 Index Term D 1 D 2 D 3 wd 1 (information) 1 1 1 wd 2 (model) 0 1 1 wd 3 (retrieval) 1 2 0 wd 4 (seminar) 1 0 0 Rank n Result Display u u u Search Engines Content - e. g. , title & snippets Layout - e. g. , grouped by category Toppings - e. g. , related searches doc. ID score 1 D 2 3 2 D 1 2 3 D 3 1 5

2015 1 8 9 2 3 10 11 4 12 5 13 6 7

2015 1 8 9 2 3 10 11 4 12 5 13 6 7 Search Engines 14 6

Query: 정보검색 (Information Retrieval) 2015 Result Categories 1. 2. 3. 4. 5. 6. 7.

Query: 정보검색 (Information Retrieval) 2015 Result Categories 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 15 16 Encyclopedia Naver Books Q&A DB (지식i. N) Magazine Café Blog Book Map Website Advertisement (파워링크) 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. Image Webpage Naver News Library Video Naver App. Store Naver Scholar Naver Post Naver Shopping News Naver Dictionary ü Proprietary (Naver-specific) content ü Dynamic category order ü Toppings • Search by Category • Related Searches • Popular Searches (by category) 17 18 19 Search Engines 20 7

Query: 정보검색 (Information Retrieval) 2020 NAVER 2015 1. Encyclopedia (지식백과) 1. Encyclopedia 2. Naver

Query: 정보검색 (Information Retrieval) 2020 NAVER 2015 1. Encyclopedia (지식백과) 1. Encyclopedia 2. Naver Dictionary (어학사전) 20. Naver Dictionary 3. Website (웹사이트) 9. Website 4. Advertisement (파워링크) 10. Advertisement 5. Naver Post (포스트) 17. Naver Post 6. Blog (블로그) 6. Blog 7. Video 8. Online Open Courses 14. Video (온라인 공개 강좌) 9. Q&A DB (지식i. N) 3. Q&A DB 10. Café (카페) 5. Café 10. Naver App. Store (앱정보) 15. Naver App. Store 2. Naver Books 7. Book (본문검색) 11. Image 10. Naver Books (Naver 책) 10. Image 4. Magazine 8. Map 12. Webpage 13. Naver News Library 16. Naver Scholar 17. Naver Shopping 19. News Search Engines 8

2020 Query: 검색엔진 (Search Engine) Search Engines NAVER 2020 (검색엔진) NAVER 2020 (정보검색) 1.

2020 Query: 검색엔진 (Search Engine) Search Engines NAVER 2020 (검색엔진) NAVER 2020 (정보검색) 1. Advertisement (파워링크) 1. Encyclopedia (지식백과) 2. Naver Dictionary (어학사전) 3. Website (웹사이트) 4. Advertisement (파워링크) 5. Naver Post (포스트) 6. Advertisement (비즈사이트) 6. Blog (블로그) 7. Video 8. Online Open Courses 9. Q&A DB (지식i. N) 10. Café (카페) 10. Naver App. Store (앱정보) 10. Naver Books (Naver 책) 10. Image (온라인 공개 강좌) 9

Query: Information Retrieval 1 2015 Result Categories 1. 2. Webpage Advertisement ü Webpage-centric content

Query: Information Retrieval 1 2015 Result Categories 1. 2. Webpage Advertisement ü Webpage-centric content ü Dynamic category order ü Toppings • Search by Category • Related Searches 2 Search Engines 10

Query: Information Retrieval 2020 Google 2020 NAVER 2020 1. Wikipedia 1. Encyclopedia (지식백과) 2.

Query: Information Retrieval 2020 Google 2020 NAVER 2020 1. Wikipedia 1. Encyclopedia (지식백과) 2. Knowledge Panel 2. Naver Dictionary (어학사전) 3. Answer Box 3. Website (웹사이트) 4. Webpage 4. Advertisement (파워링크) 5. Naver Post (포스트) 6. Blog (블로그) 7. Video 8. Online Open Courses (온라인 공개 강좌) 9. Q&A DB (지식i. N) 10. Café 10. App. Store, Books, Image Search Engines 5. Related Searches (연관검색어) Top Categories (subset) Image Naver Dictionary (어학사전) Video Image News Blog Books News Maps Books Shopping Encyclopedia (지식백과) Finance Website 11

Query: Search Engines 2020 Google SERP Features by Overthink Group Google 2020 (Search Engines)

Query: Search Engines 2020 Google SERP Features by Overthink Group Google 2020 (Search Engines) Google 2020 (Information Retrieval) 1. Wikipedia 2. Knowledge Panel 3. Answer Box 4. Disambiguation Box 4. Webpage 5. Webpage 6. Top Stories (News) 7. Webpage 8. Related Searches 5. Related Searches Knowledge Graphs § § § Search Engines Knowledge Panel Answer Box Disambiguation Box Carousels Google Posts 12

Search Engine vs. Database vs. Directories Search Engine Database Directories Corpus Type General Specific

Search Engine vs. Database vs. Directories Search Engine Database Directories Corpus Type General Specific General/Specific Data Collection Automatic - crawler/spider Manual - data entry/import - classification Data Quality Not controlled Controlled Data Organization None (bag-of-words) Structured - Relational Structured - Hierarchical Query Input Text box Field-specific - Boolean Text box Search Result Ranked - documents Not ranked - records Ranked - categories Search Index Document text Database Tables Category Tree e. g. Google Library Search curlie. org Search Engines 13