Semantics for e Science Susie Stephens Principal Research

  • Slides: 24
Download presentation
Semantics for e. Science Susie Stephens, Principal Research Scientist, Eli Lilly

Semantics for e. Science Susie Stephens, Principal Research Scientist, Eli Lilly

Outline • Introduction to the Semantic Web • W 3 C’s Semantic Web for

Outline • Introduction to the Semantic Web • W 3 C’s Semantic Web for Health Care and Life Sciences Interest Group • Semantic Web Solutions at Lilly

Introduction to the Semantic Web

Introduction to the Semantic Web

Drivers for the Semantic Web • Business models develop rapidly these days, so infrastructure

Drivers for the Semantic Web • Business models develop rapidly these days, so infrastructure that supports change is needed • Organizations are increasingly forming and disbanding collaborations so need to be able to better share data • Increasing need in pharma to be able to query across data silos • Data is growing so quickly that it is no longer possible for individuals to identify patterns in their heads • Increasing recognition of the benefits of collective intelligence

Characterizing the Semantic Web • Semantic Web is an interoperability technology • An architecture

Characterizing the Semantic Web • Semantic Web is an interoperability technology • An architecture for interconnected communities and vocabularies • A set of interoperable standards for knowledge exchange

Creating a Web of Data Applications Graph representation Data in various formats Source: Ivan

Creating a Web of Data Applications Graph representation Data in various formats Source: Ivan Herman

Mashing Data Source: W 3 C

Mashing Data Source: W 3 C

W 3 C’s Semantic Web for Health Care and Life Sciences Interest Group

W 3 C’s Semantic Web for Health Care and Life Sciences Interest Group

Task Forces • Terminology – Semantic Web representation of existing resources • Task lead

Task Forces • Terminology – Semantic Web representation of existing resources • Task lead - John Madden • Scientific Discourse – building communities through networking • Task leads - Tim Clark, John Breslin • Clinical Observations Interoperability – patient recruitment in trials • Task lead - Vipul Kashyap • Bio. RDF – integrated neuroscience knowledge base • Task lead - Kei Cheung • Linking Open Drug Data – aggregation of Web-based drug data • Task lead - Chris Bizer • Other Projects: Clinical Decision Support, URI Workshop, Collaborations with CDISC & HL 7

Bio. RDF: Integrating Heterogeneous Data • Integration and analysis of heterogeneous data sets •

Bio. RDF: Integrating Heterogeneous Data • Integration and analysis of heterogeneous data sets • Hypothesis, Genome, Pathways, Molecular Properties, Disease, etc. PDSPki Gene Ontology Neuron. DB Reactome BAMS Antibodies NC Annotations Entrez Gene Allen Brain Atlas Brain. Pharm MESH Mammalian Phenotype SWAN Alz. Gene Homologene Publications Pub. Chem

Bio. RDF: Looking for Targets for Alzheimer’s • Signal transduction pathways are considered to

Bio. RDF: Looking for Targets for Alzheimer’s • Signal transduction pathways are considered to be rich in “druggable” targets • CA 1 Pyramidal Neurons are known to be particularly damaged in Alzheimer’s disease • Casting a wide net, can we find candidate genes known to be involved in signal transduction and active in Pyramidal Neurons? Source: Alan Ruttenberg

Bio. RDF: SPARQL Query Source: Alan Ruttenberg

Bio. RDF: SPARQL Query Source: Alan Ruttenberg

Bio. RDF: Results: Genes, Processes DRD 1, 1812 ADRB 2, 154 DRD 1 IP,

Bio. RDF: Results: Genes, Processes DRD 1, 1812 ADRB 2, 154 DRD 1 IP, 50632 DRD 1, 1812 DRD 2, 1813 GRM 7, 2917 GNG 3, 2785 GNG 12, 55970 DRD 2, 1813 ADRB 2, 154 CALM 3, 808 HTR 2 A, 3356 DRD 1, 1812 SSTR 5, 6755 MTNR 1 A, 4543 CNR 2, 1269 HTR 6, 3362 GRIK 2, 2898 GRIN 1, 2902 GRIN 2 A, 2903 GRIN 2 B, 2904 ADAM 10, 102 GRM 7, 2917 LRP 1, 4035 ADAM 10, 102 ASCL 1, 429 HTR 2 A, 3356 ADRB 2, 154 PTPRG, 5793 EPHA 4, 2043 NRTN, 4902 CTNND 1, 1500 adenylate cyclase activation arrestin mediated desensitization of G-protein coupled receptor protein signaling pathway dopamine receptor, adenylate cyclase activating pathway dopamine receptor, adenylate cyclase inhibiting pathway G-protein coupled receptor protein signaling pathway G-protein coupled receptor protein signaling pathway G-protein signaling, coupled to cyclic nucleotide second messenger G-protein signaling, coupled to cyclic nucleotide second messenger glutamate signaling pathway integrin-mediated signaling pathway negative regulation of adenylate cyclase activity negative regulation of Wnt receptor signaling pathway Notch receptor processing Notch signaling pathway serotonin receptor signaling pathway transmembrane receptor protein tyrosine kinase activation (dimerization) ransmembrane receptor protein tyrosine kinase signaling pathway transmembrane receptor protein tyrosine kinase signaling pathway Wnt receptor signaling pathway Many of the genes are related to AD through gamma secretase (presenilin) activity Source: Alan Ruttenberg

LODD: Introduction Use Semantic Web technologies to 1. publish structured data on the Web

LODD: Introduction Use Semantic Web technologies to 1. publish structured data on the Web 2. set links between data from one data source to data within other data sources Linked Data Browsers Linked Data Mashups Search Engines Thing Thing Thing typed links A typed links B typed links C typed links D E Source: Chris Bizer

LODD: Potential Links between Data Sets Source: Chris Bizer

LODD: Potential Links between Data Sets Source: Chris Bizer

LODD: Potential questions to answer • Physicians and Pharmacists • • • What are

LODD: Potential questions to answer • Physicians and Pharmacists • • • What are alternative drugs for a given indication (disease)? What are equivalent drugs (generic version of a brand name, or the chemical name of a active ingredient)? Are there ongoing clinical trials for a drug? • Patients • • What background information is available about a drug? What are the contraindications of a drug? Which alternative drugs are available? What are the results of clinical trials for a drug? • Pharmaceutical Companies • • What are other companies with drugs in similar areas? Which companies have a similar therapeutic focus? Source: Chris Bizer

LODD: Linked Version of Clinical. Trials. gov • Total number of triples: 6, 998,

LODD: Linked Version of Clinical. Trials. gov • Total number of triples: 6, 998, 851 • Number of Trials: 61, 920 • RDF links to other data sources: 177, 975 • Links to: DBpedia and YAGO (from intervention and conditions) • Geo. Names (from locations) • Bio 2 RDF. org's Pub. Med (from references) • Source: Chris Bizer

Semantic Web Solutions at Lilly

Semantic Web Solutions at Lilly

Implementations at Lilly • Integration of Clinical and Pathways Data • Competitive Intelligence •

Implementations at Lilly • Integration of Clinical and Pathways Data • Competitive Intelligence • Experimental Metadata • Discovery Metadata

Discovery Metadata: Goals • Integrate master data throughout the discovery process to enable information

Discovery Metadata: Goals • Integrate master data throughout the discovery process to enable information sharing/integration for the scientific community Model key relationships between master data classes • Provide ability to integrate disparate data sets quicker than the normal warehouse paradigm typically allows • Create a re-usable and sustainable semantic implementation • Allow for user-driven, manual curation of key data relationships • Source: Phil Brooks

Discovery Metadata: Ontology SAP Legacy REFDB GSM Manual Curation NCBI Source: Phil Brooks

Discovery Metadata: Ontology SAP Legacy REFDB GSM Manual Curation NCBI Source: Phil Brooks

Discovery Metadata: Architecture A P P S Application 1 S O A Application 2

Discovery Metadata: Architecture A P P S Application 1 S O A Application 2 SOA Layer/Enterprise Service Bus (Web. Services, Visualizers, Data. Access Components) SQL D A T A … Application 3 Source Model 1 Source Model 2 Source Model 3 Source Model 4 Authentication SPARQL Other Sources … Sources Rdbms Local Assertions ETL Top Level Ontology Provenance Other Tools Spreadsheets Source: Phil Brooks

External Collaborations • RDF Access to Relational Databases - Chris Bizer, Eric Prud'hommeaux •

External Collaborations • RDF Access to Relational Databases - Chris Bizer, Eric Prud'hommeaux • Scalability testing of relational to RDF mapping approaches • End User Semantic Web Authoring - David Karger • Enhancing the scalability and robustness of the Exhibit and Potluck tools • Scientist-Driven Semantic Integration of Knowledge in Alzheimer's Disease - Tim Clark, June Kinoshita • Project to develop an integrated knowledge infrastructure for the neuromedical research community, pairing rich digital semantic context with the ever-growing digital scientific content on the web • Provenance Collection and Management - Carole Goble, Beth Plale • Project to develop a metadata taxonomy for global data at Lilly which enables the rapid integration of data and mining/analysis algorithms into dataflows which support clinical and discovery decisions • W 3 C’s Health Care and Life Sciences Interest Group

Conclusion • Many Semantic Web solutions are being explored within the health care and

Conclusion • Many Semantic Web solutions are being explored within the health care and life sciences community • Lilly is seeing tangible benefits in multiple projects from Semantic Web • Semantic Web provides a flexible framework for data integration Incremental adoption of technology • Flexibility to integrate unanticipated data sets • Link existing silos together • • Lilly is setting up open collaborations in this space • Try out LSG