An Automatic Retrieval System for Expert and Consumer




![Resources • Automatic indexing in MEDLINE: – MMTx [U. S. NLM]: MMTx focus on Resources • Automatic indexing in MEDLINE: – MMTx [U. S. NLM]: MMTx focus on](https://slidetodoc.com/presentation_image_h2/c952db962be27761ad1c66391bb03aa7/image-5.jpg)

![The AMTEx method [DKE 2009] • Main idea: ü Initial term extraction based on The AMTEx method [DKE 2009] • Main idea: ü Initial term extraction based on](https://slidetodoc.com/presentation_image_h2/c952db962be27761ad1c66391bb03aa7/image-7.jpg)















- Slides: 22
An Automatic Retrieval System for Expert and Consumer Users Rena Peraki, Euripides G. M. Petrakis Angelos Hliaoutakis Intelligent Systems Laboratory www. intelligence. tuc. gr Technical University of Crete (TUC) Chania, Crete, Greece
Problem Definition • Medical information systems are designed for experts ! – Use complex terms in their searches – Domain specific answers • Must also serve naive consumers – Do simple searches using natural language terms – Easy to read and comprehend information • Investigate methods for the categorization of information by user profile 2 BIBE 2012, Larnaca, Cyprus
Current Practices • Med. Scape, Medlineplus, Med. Hunt rely on the manual translation and categorization of information for consumers – Slow, does not scale-up for large collections • In MEDLINE of U. S. NLM, documents are indexed by experts and for experts only – No categorization by user profile – 10 -12 Me. SH terms per document (pathology, disease, treatment, drugs etc) – Over 15 million documents - Slow !! – Need to automate this process 3 BIBE 2012, Larnaca, Cyprus
Objectives • Investigate methods for automatic document indexing in MEDLINE • These index terms are subsequently used for filtering documents by user profile • Main Idea: categorization of terms to simple terms comprehendible by consumers or more involved terms suitable for experts 4 BIBE 2012, Larnaca, Cyprus
Resources • Automatic indexing in MEDLINE: – MMTx [U. S. NLM]: MMTx focus on UMLS rather than Me. SH – AMTEx [DKE, 2009]: Me. SH terms, faster and more accurate than MMTx • Dictionaries for biomedical and health related concepts – UMLS Metathesaurus, Me. SH • Dictionaries for general English words – Word. Net, Specialist 5 BIBE 2012, Larnaca, Cyprus
MMTx (Meta. Map Transfer) • Developed by U. S. NLM • Maps text to UMLS Metathesaurus concepts – but MEDLINE indexing is based on Me. SH – Me. SH is a subset of Metathesaurus ü Suffers from term overgeneration ü Unrelated terms added to the final candidate list ü The list must be cleaned-up to keep only Me. SH terms ü Topic drift 6 BIBE 2012, Larnaca, Cyprus
The AMTEx method [DKE 2009] • Main idea: ü Initial term extraction based on a hybrid linguistic/statistical approach, the C/NC value ü Extracts general single and multi-word terms (noun phrases) ü Mainly multi-word terms: “heart disease”, “coronary artery disease” ü Extracted terms are validated against Me. SH ü Faster, improved precision by merely a fifth of term output of MMTx 7 BIBE 2012, Larnaca, Cyprus
Input: Full text article Example MEDLINE index terms: “Aged”, “Data Collection”, “Humans”, “Knee”, “Middle Aged”, “Osteoarthritis, Knee/complications”, “Osteoarthritis, Knee/diagnosis”, “Pain/classification”, “Pain/etiology”, “Prospective Studies”, “Research Support, Non-U. S. Gov’t” MMTx terms: “osteoarthritis knee”, “retention”, “peat”, “rheumatology”, “acetylcholine”, “lysine acetate”, “potassium acetate”, “questionnaires”, “target population”, “selection bias”, “creativeness”, “reproduction”, “cohort studies”, “europe”, “couples”, “naloxone”, “sample size”, “arthritis”, “data collection”, “mail” ‘health status”, “respondents”, “ontario”, “universities”, “dna”, “baseline survey”, “medical records”, “informatics”, “general practitioners”, “gender”, “beliefs”, “logistic regression”, “female”, “marital status”, “employment status”, “comprehension”, “surveys”, “age distribution”, “manual”, “occupations”, “manuals”, “persons”, “females”, “minority groups”, “incentives”, “business”, “ability”, “comparative study”, “odds ratio”, “biomedical research”, “pubmed”, “copyright”, “coding”, “longitudinal studies”, “immunoelectrophoresis”, “skin diseases”, “government”, “norepinephrine”, “social sciences”, “survey methods”, “tyrosine”, “new zealand”, “azauridine”, “gold”, “nonrespondents”, “cycloheximide”, “rheum”, “jordan”, “cadmium”, “radiopharmaceuticals”, “community”, “disease progression”, “history” AMTEx terms: “health surveys”, “pain”, “review publication type”, “data collection”, “osteoarthritis knee”, “science”, “health services needs and demand”, “population”, “research”, “questionnaires”, “informatics”, “health” 8 BIBE 2012, Larnaca, Cyprus
Term & Document Categorization 9 BIBE 2012, Larnaca, Cyprus
New Vocabularies • Vocabulary of General Terms (VGT): 105. 675 general (Word. Net) terms • Vocabulary of Consumer Terms (VCT): 7, 165 consumer (Me. SH) terms. • Vocabulary of Expert Terms (VET): 16, 719 consumer (Me. SH) terms 10 BIBE 2012, Larnaca, Cyprus
Document Categorization • Documents are represented by vectors of terms extracted by AMTEx, MMTx or assigned by human experts • The more VET (VCT) terms a document contains the higher its probability to be suitable for experts (consumers) – E. g. , a document with VET% = 0. 62 has 62% probability to be one suitable for experts 11 BIBE 2012, Larnaca, Cyprus
Evaluation • Precision and Recall measures: a good method has high values of both • Datasets: OHSUMED: 348, 566 MEDLINE abstracts that come with 64 queries and their relevant answers • Ground truth: the set of Me. SH index terms assigned to documents by experts 12 BIBE 2012, Larnaca, Cyprus
AMTEx vs MMTx • AMTEx: faster, improved precision by merely a fifth of term output of MMTx Data Set Method Number of Terms OHSUMED AMTEX MMTX 8 40 PMC AMTEX MMTX 25 72 13 Recall Time (hours) 0. 125 0. 089 0. 101 0. 336 7. 383 14. 516 0. 034 0. 033 0. 062 0. 162 1. 387 2. 727 Precision BIBE 2012, Larnaca, Cyprus
Categorization by User Profile • How good is the method in retrieving answers for consumers and experts ? • We run retrievals for consumers & experts – 15 out of the 64 queries contain no expert terms and are suitable for consumers – The remaining queries are suitable for experts – Documents are represented by document vectors of Me. SH, MMTx, or AMTEx terms – The retrieval method is Vector Space Model – The document similarity score of VSM is multiplied by its respective VET or VCT score 14 BIBE 2012, Larnaca, Cyprus
Consumers Retrieval Task 15 BIBE 2012, Larnaca, Cyprus
Experts Retrieval Task 16 BIBE 2012, Larnaca, Cyprus
Results Analysis • The results indicate – A tendency of human experts to assign simple terms to documents and – Selective ability of AMTEx in extracting complex terms suitable for experts 17 BIBE 2012, Larnaca, Cyprus
Conclusions & Future Work • We investigate methods: – Automatic document indexing – Categorization by user profile • AMTEx is well suited for both problems • Future work: more elaborate document categorization methods (machine learning, fuzzy) • More term and document categories – According to UMLS SN (pathology, treatment) – User categories (e. g. , specialty) 18 BIBE 2012, Larnaca, Cyprus
Questions and answers 19 BIBE 2012, Larnaca, Cyprus
INPUT: Document Collection ΑΜΤΕx Outline C/NC value Multi-word Term Extraction & Term Ranking Me. SH Term Validation Me. SH Thesaurus Resource Single-word Term Extraction OUTPUT: Non-Me. SH multi-word are broken down & validated against Me. SH Term Lists Variant Generation 20 Term Expansion (Me. SH) BIBE 2012, Larnaca, Cyprus
Me. SH: Medical Subject Headings The NLM medical & biological terms thesaurus: • Organized in IS-A hierarchies – more than 15 taxonomies & more than 22, 000 terms – a term may appear in multiple taxonomies • No PART-OF relationships • Terms organized into synonym sets called entry terms, including stemmed term forms 21 BIBE 2012, Larnaca, Cyprus
Fragment of the Me. SH IS-A Hierarchy Root Nervous system diseases Cranial nerve diseases Neurologic manifestations pain headache neuralgia 22 Facial neuralgia BIBE 2012, Larnaca, Cyprus