Information Retrieval 2003 9 24 Information Retrieval l
- Slides: 41
의료정보검색 (Information Retrieval) 2003. 9. 24. 최진욱
정보검색이란 Information Retrieval l 원하는 정보를 찾는 것 l Data retrieval vs. Information retrieval l 2
What is IR? l Information Retrieval is a science which deals with the knowledge representation, storage, organization and access of information items. 3
NLM and Medline l 10 million articles l 3, 500 journals since 1966 l Pub. Med, Internet Grateful Med – http: //www. nlm. nih. gov 5
IR 관련된 기술분야 l Internet l Search Engine l Vocabulary System l Information Modeling l Filtering and Classification l Natural Language Processing l …. … 6
Internet l TCP/IP를 사용하는 전세계적인 network l public(not free, but open to everyone) l carrier of electronic mail l convenient to get free SW l terabytes of information l dynamic rerouting 7
Telephone network 8
Another network 사용료? 9
Network in early stage 국방성 ARPAnet l TCP/IP 통신프로토콜 사용 l 10
TCP/IP l Protocol – rules of behavior – 한국 : 정지 신호 준비 – 독일 : 출발 준비 l TCP/IP – 2 widely used network protocols : computer network 에 접속하기 위한 100 여 가지의 규약 11
Internet Address 32 bit 8 ~ 24 bit Network 주소 Host 주소 12
Internet Classes 13
E-mail Address jinchoi@snu. ac. kr 14
URL 15
Hypertext 16
인터넷에서 정보 찾기 l Search engine 이용 l News group에 문의 l Mailing list 활용 18
Internet Search Engine l www. yahoo. com – www. yahoo. co. kr l www. altavista. com l www. dreamwiz. com l www. naver. com – www. altavista. co. kr l www. excite. com l www. lycos. com, – www. lycos. co. kr 19
Internet Search Engine 20
AND search l Search for Monet AND Renoir l Search for +Monet +Renoir l Search for Monet Renoir – “All the words” option 22
OR search l Search for UPS U. P. S. l Search for UPS OR U. P. S. l Search for UPS U. P. S – “Any of the Words” option l “foreign policy” vs foreign policy 23
NOT search l Search for “bugs life” -ants l Search for “bugs life” NOT ants l Search for “bugs life” AND NOT ants 24
Near Search l Korea NEAR climate – Altavista (advanced search) – two terms within 10 words l Korea NEAR climate – Lycos (advanced search) – two terms within 25 words 25
USENET news server in SNU - USENET system news server 광범위한 게시판 시스템 in Melbourne - news group USENET 에 개설된 토론 그룹을 말함 26
Newsgroup Search Engine 27
Mailing list computer & privacy travel, weather 백악관 안터넷변천사 n automatic mailing programs l LISTSERV l Majordomo 28
IR Modeling
IR steps l Text processing l Indexing – inverted file – signature file l Organization in DB l Query processing l Evaluation 30
Information-Retrieval Process Information Need Content Indexing Database Query Formulation Retrieval Query Result Evaluation Refinement 31
Indexing Process document accent spacing stop word noun group stemming automatic or manual indexing structure recognition full text index terms 32
Classification of IR Classic Models U s e r T a s k Retrieval: Adhoc Filtering boolean vector probabilistic Structured Models non-overlapping lists proximal nodes Browsing Set Theoretic Fuzzy Extended Boolean Algebraic Generalized Vector Lat. Semantic Index Neural Network Probabilistic Inference Network Belief Network Browsing Flat Structure Guided Hypertext 33
Boolean Model query can be written in disjunctive normal form l q = ka (kb kc) l qdnf = (1, 1, 1) (1, 1, 0) (1, 0, 0) l Ka Kb Kc 34
Vector Model and Weight function K = {소나타, 2000 cc, 자동변속, 흰색, …, kt} D 1 = {20, 11, 5, … , 5} D 2 = {20, 18, 12, 4, …, 5} D 30 = {0, 20, 12, 3, …, 9} 현대차 삼성차 weight terms are assumed to be mutually independent ! 35
Boolean vs. Vector model Petroleum Mexico Oil Texas Refinery Ship Boolean (1 1 1 0) Vector (2. 8 1. 6 3. 5 3 3. 1 1) 36
Retrieval Issues (aspirin, prevention) l Indexing (prevention, …) – inverted file (aspirin, attack, heat) l Ranking – relevance에 따른 ranking – chronology에 따른 ranking l Display Item Attribute (Doc. #) Aspirin Attack Heart Prevention 1, 5, 6, 9 3, 6, 7, 8 4, 6, 7, 10 1, 2, 6, 9 37
Indexing with Inverted File Radiology Results 환자번호 : 27750177 판독번호 : 20022777035 검사코드 : RC 102 검사명: Brain CT (Pre contrast) 주진단상병명 : Infarction Of Posterior Cerebral Artery Territory 검사일자 : 2002 -08 -02 검사결과 : BRAIN MRI + MRA [Finding] 양쪽 PVWM의 여러 개의 UBO는 underlying SVD의 가능성이 있을 것으로 보임. Left thalamus와 occipital lobe에 patchy high signal intensity가 있고 FLAIR image에서 마찬가지 소견임. T 1 WI에서 iso signal intensity의 portion 으로 … WORD D abort 1, 5 aberrant 3 brain 7, 3 P W … … Significant words only portion 4, 8 topology 5, 8 38
Word based indexing 문제점 l Context – meaning is affected by meaning of other words l l l high, blood, pressure low pressure at high altitude increase red blood cell Polysemy – lead vs lead l Synonymy – hypertension vs. high blood pressure l Granularity – antibiotics, penicillin l Focus of Content – Key word vs Plain word 40
The End 41
- Statistical language models for information retrieval
- Information retrieval
- Information retrieval and web search
- Which internet service is used for information retrieval
- Search engines information retrieval in practice
- Information retrieval
- Relevance information retrieval
- Information retrieval system
- Search engine architecture
- Information retrieval stanford
- Skip pointers in information retrieval
- Introduction to information retrieval
- Relevance feedback example
- A formal study of information retrieval heuristics
- Information retrieval
- Link analysis in information retrieval
- Sequential searching in information retrieval
- Information retrieval
- Information retrieval
- Information retrieval
- Recall skip pointers: what is skip span?
- Information retrieval data structures and algorithms
- Modern information retrieval
- Search engines information retrieval in practice
- Map information retrieval
- Tokenization in information retrieval
- Information retrieval tutorial
- Search engines information retrieval in practice
- Relevance information retrieval
- Information retrieval
- Introduction to information retrieval
- Precision and recall in information retrieval
- Information retrieval
- Link analysis in information retrieval
- Information retrieval tools and techniques
- Elasticsearch information retrieval
- Relevance information retrieval
- Information retrieval
- Information retrieval lmu
- Spoken language audio retrieval in irs
- Information retrieval
- Information retrieval