Integrating OMIM Mendelian database to SPOKE Xiaoming Sherman
Integrating OMIM Mendelian database to SPOKE Xiaoming (Sherman) Jia, MD MEng Baranzini Lab 1 Genetics and SPOKE 9/21/2021
Data sources Disease ontology OMIM Gene-disease relationships Disease characteristics Processed OMIM Processed Disease ontology Extract relationships with highest level of evidence (phenotype mapping key = 3), inheritance patter (Mendelian or other), and modifiers Extract OMIM ID to DOID mappings Integrate GENE-DOID mappings into SPOKE 2 Genetics and SPOKE 9/21/2021
OMIM raw data requires some text parsing 3 Gene ENTREZ ENSEMBL CAMTA 1 23261 ENSG 00000171735 PARK 7 11315 GABRD 2563 KIF 1 B 23095 CTRC 11330 Genetics and SPOKE Disease Cerebellar ataxia, nonprogressive, with mental retardation, 614756 (3), Autosomal dominant Parkinson disease 7, autosomal recessive early-onset, 606324 (3), ENSG 00000116288 Autosomal recessive {Epilepsy, generalized, with febrile seizures plus, type 5, susceptibility to}, 613060 (3), Autosomal dominant; {Epilepsy, idiopathic generalized, ENSG 00000187730 10}, 613060 (3), Autosomal dominant; {Epilepsy, juvenile myoclonic, susceptibility to}, 613060 (3), Autosomal dominant ? Charcot-Marie-Tooth disease, type 2 A 1, 118210 (3), Autosomal dominant; {Neuroblastoma, susceptibility to, 1}, 256700 (3), Autosomal ENSG 00000054523 dominant, Isolated cases; Pheochromocytoma, 171300 (3), Autosomal dominant {Pancreatitis, chronic, susceptibility to}, 167800 (3), Autosomal ENSG 00000162438 dominant 9/21/2021
Disease-gene relationships from OMIM: keep bolded Mapping code Level of evidence 1 Disorder is placed on the map based on its association with a gene, but the underlying defect is not known. 2 3 4 4 Genetics and SPOKE Disorder has been placed on the map by linkage; no mutation has been found. The molecular basis for the disorder is known; a mutation has been found in the gene. a contiguous gene deletion or duplication syndrome, multiple genes are deleted or duplicated causing the phenotype Coun t 73 355 6233 5 Inheritance Count Autosomal recessive Autosomal dominant unknown X-linked recessive X-linked dominant Multifactorial X-linked Mitochondrial Isolated cases Somatic mutation Digenic recessive Somatic mosaicism Y-linked 2828 2390 1229 208 90 69 68 49 45 23 15 4 1 9/21/2021
Edits to raw OMIM data 5 • Encode modifiers if disease name contains: • “susceptibility for” (299) • “modifier of” (27) • “protection against” (30) • “resistance to” (25) • “reduced risk of” (6) • Add to inheritance patterns if disease name contains : • “somatic” or “somatic mosaic” (212) • “digenic” (19) • “autosomal recessive” (19) • “autosomal dominant” (15) • “X-linked” (9) • “Y-linked” (1) Genetics and SPOKE 9/21/2021
Formatted OMIM data (ready for integration) GENE OMIM DOID INHERITANCE MODIFIER DISEASE AGRN 615120 110657 AR - Myasthenic syndrome, congenital, 8, with pre- and postsynaptic defects B 3 GALT 6 615349 50802 AR - DVL 1 616331 60765 AD - Ehlers-Danlos syndrome, spondylodysplastic type, 2 Robinow syndrome, autosomal dominant 2 TMEM 240 607454 50972 AD - Spinocerebellar ataxia 21 GNB 1 613065 9952 SOMATIC - Leukemia, acute lymphoblastic, somatic GNB 1 616973 70072 AD - Mental retardation, autosomal dominant 42 SKI 182212 2340 AD - Shprintzen-Goldberg syndrome CEP 104 616781 110994 AR - Joubert syndrome 25 NPHP 4 606966 111115 AR - Nephronophthisis 4 MTHFR 188050 2452 AD SUSCEPTIBILITY Thromboembolism, susceptibility to ALPL 146300 110913 AD, AR - Hypophosphatasia, adult Total: 3, 858 mappable gene-disease relationships 6 Genetics and SPOKE 9/21/2021
Recommended filtering after integration 7 • High-confidence Mendelian relationships (3, 220): • Keep Mendelian inheritance: autosomal dominant (AD), autosomal recessive (AR), X-linked dominant (XLD), X-linked recessive (XLR), X-linked (XL), Mitochondrial (MT), Digenic recessive (DR), or Y-linked (YL). May include Mendelian AND SOMATIC (hereditary cancer syndromes). • Exclude relationships with modifiers (i. e. susceptibility = “-”) • Moderate-confidence Mendelian relationships (137): • Mendelian relationships that have modifiers: susceptibility for (SUSCEPTIBILITY), modifier of (MODIFIES), protection against (PROTECTIVE), resistance to (RESISTANCE), reduced risk of (REDUCED). • Low-confidence relationships (335): • Relationships that don’t have a Mendelian inheritance (i. e. inheritance = “-”) • Somatic (166): • Inheritance = “SOMATIC” (i. e. not Mendelian and not unknown) Genetics and SPOKE 9/21/2021
- Slides: 7