Allotrope Framework Drives Innovation In Lab Informatics Dr
Allotrope Framework Drives Innovation In Lab. Informatics Dr. Gerhard Noelken Allotrope Bo. D member, Pfizer Allotrope Liaison Chemical Information and Computer Applications, RSC London 20 October 2015 © 2015 Allotrope Foundation
• MOTIVATION • THEORY • REDUCING IT TO PRACTICE © 2015 Allotrope Foundation 2
Why is access to music so much easier than access to scientific data? Think about music. . . Music is typically stored in a small number of standard, non-proprietary formats… Think about scientific data. . . Scientific data is typically stored in a wide variety of non-standard, proprietary formats…. tbl . dat . HDF . XM L . DAML. LCD . jdx . csv. drdd …with contextual metadata that are complete, consistent & correct Artist Album Song Genre Date Artwork …enabling the user to find, share and enjoy it years later from any device easily! © 2015 Allotrope Foundation . asc. cdf. frx . irf. pdid . raw …with contextual metadata that are hard to find and sometimes inconsistent Material Equipment Process Result . . . making it costly and sometimes difficult to find and get value from it. 3
What if scientific data was as easy to access as music? If we. . . We could. . . • Store scientific and process data in a • Find data in seconds. standard format with contextual metadata that is. . . • Be confident that the data that underpins our decisions is accurate, complete, and • correct compliant. • complete • Build data quality and data integrity into the • consistent system, eliminating the need for many SOPs • compliant and quality investigations. • Simplify, automate and improve laboratory and manufacturing processes. • Automatically create technical reports, audit trails, and substantial portions of regulatory submission documents. © 2015 Allotrope Foundation • Answer complex questions, not just those accessible via simple queries - by linking data from diverse, disparate sources. 4
Allotrope Foundation Member Companies Biogen Abb. Vie • Subject Matter Experts Boehringer Ingelheim Amgen Bristol-Myers Squibb Baxter • Project Funding Eli Lilly Bayer Secretariat • Project Management • Legal & Logistical Support Professional Software Firm • Framework Development • Technical Leadership Partner Network © 2015 Allotrope Foundation Genentech/Roche Glaxo. Smith. Kline Merck & Co. Pfizer Mettler Toledo Thermo Scientific ACD/Labs Persistent Agilent Waters • Requirements & Specifications Riffyn Biovia Erasmus Univ. • Contributions, Po. C Applications Sartorius BSSN Med Center Shimadzu IDBS University of Southampton Mestrelab Research 5
• MOTIVATION • THEORY • REDUCING IT TO PRACTICE © 2015 Allotrope Foundation 6
What is Allotrope Creating? Instrument Application 2 Application 1 New Instrument A toolkit that enables use of the standards & metadata in software development Metadata repository Allotrope Foundation Framework Reusable Software Components File format for any technique or instrument. dat. etc . DAML . HDF . LCD . etc. csv. etc. raw. etc Standard File Format . tbl . mz. asc ML. cdf. pdid. frx. drdd. irf. etc © 2015 Allotrope Foundation Standard vocabulary & structure for metadata Project AF 0012354 AE 0012764 AF 12989 . etc . XML. etc. jdx Open Metadata Repository . adf AF 0045674 AF-0034558 Project AF 0012354 AF 0012764 AF 0012989 AF 0013142 AF 0045674 AF 0034558 Test IR Fingerprinting Bulk & Tapped Density NMR Characterization Tapped & Bulk Density Caractérisation RMN IR Instrument QC Lab #33 B 380 FT-IR ASTM Standard Seive #6 AM 500 Sieve XXX Nouvelle DRX 600 i. S 10 FT-IR With the Metadata Repository Test Instrument IR Fingerprinting 380 FTIR/-SN/145453 Bulk and Tapped Density ASTM Sieve-SN/3452 NMR Characterization AM 500 -SN/0034578 Bulk and Tapped Density ASTM Sieve-SN/09783 NMR Characterization DRX 600 -SN/10234567 IR Fingerprinting i. S 10 FTIR/-SN/341980 7
High Variability of Result Data p. H mass spectroscopy NMR © 2015 Allotrope Foundation thermogravimetry chromatography HPLC-MS-MS cell counter … 8
Landscape of Existing Standards NISO LC OAI W 3 C ISO OASIS OMG IETF CDISC © 2015 Allotrope Foundation 9
Key Requirements Technical capabilities • Large data volume, small file size, fast • Arbitrary techniques; extensible • Platform independent Comprehensive Metadata • Who, what, when, where, why and how • Scientist, sample, time stamp/audit trail, instrument, purpose, method Long term data access • Documented file format • Vendor neutral format • Adaptable and extensible © 2015 Allotrope Foundation 10
Allotrope Data Format (ADF) Allotrope Data Format Data Description RDF Model Data Cubes Universal data container Data Package Virtual file system * HDF 5 Platform Independent File Format Contains: • Method, instrument, sample, process, result, etc. • Data cube metadata • Data package metadata • … Analytical data represented by one - or multidimensional arrays. Analytical data represented by arbitrary formats, incl. native instrument formats, images, pdf, video, etc. Specifically designed to store and organize large amounts of numerical data. * Use is optional © 2015 Allotrope Foundation 11
Allotrope Taxonomies: An Extensible Metadata Model • A library of extensible taxonomies – Uses W 3 C standards – Easy to understand maintain by SMEs and Vendors • Start by harvesting existing available concepts – PSI-MS; IUPAC; RSC Chemical Methods Ontology; Dictionary of weighing terms; An. IML, etc • Reproducible & efficient collaboration model – Leverages knowledge engineers & member company scientists – 2 -3 weeks to develop initial version of a new taxonomy © 2015 Allotrope Foundation • Initial versions of 12 analytical techniques already implemented: – – – gas chromatography Karl Fischer liquid chromatography mass spectrometry nuclear magnetic resonance spectroscopy thermogravimetric analysis ultra violet spectrometry cell counter cell culture analyzer blood gas analysis balance p. H 12
Allotrope Foundation Taxonomies © 2015 Allotrope Foundation
The Big Picture ADF Data Package ADF Data Cube © 2015 Allotrope Foundation 14
Result Equipment Process © 2015 Allotrope Foundation Co Ma m m ter on ial
ADF Class Library Data Package API Data Cube API Data Description API (Apache Jena) Triple Store API Taxonomies Analytical Data API Platform independent file format (HDF 5) © 2015 Allotrope Foundation 16
RDF Data Model • Subject-Predicate-Object (Triple) • Example: <Sample 1> type <Sample> <Sample 1> created. On ‘ 2015 -03 -13’ <Sample 1> created. By <person X> <Sample 1> has. Barcode ‘ 1234567890’ Subject © 2015 Allotrope Foundation Predicate Object 17
Data Shapes Constrain How We Use Taxonomies in the Real World – Taxonomies provide an unconstrained vocabulary that we can use to describe things (instances) in our open world and give them a meaning (= what it is) – We need a mechanism to define data structures (schemas, templates) that describe how to use the taxonomies for a given purpose in a standardized (= reproducible, predictable, verifiable) way – Shapes Constraint Language (SHACL, expressed as RDF triples) is an emerging standard to do this http: //www. w 3. org/2014/data-shapes/charter © 2015 Allotrope Foundation 18
Using Data Shapes: Equipment A system has at least 1 component Shape hierarchies define additional constraints A hplc system has at least 1 component and has at least 1 column, exactly 1 autosampler and at least 1 detector A hplc-uv system has at least 1 component and has at least 1 column, exactly 1 autosampler and at least 1 detector and at least 1 uv-detector © 2015 Allotrope Foundation 19
The ADF enables a self-contained documentation of the data & metadata Legend Process Step Request Plan Analysis Prepare Samples © 2015 Allotrope Foundation Submit Samples Control Inst. Acquire Data Process Data Analyze Data Reports Results Store, Archive Data 20
The ADF enables a self-contained documentation of the data & metadata Legend Process Step Data & Metadata Interoperability (Plug & Play) Request Plan Analysis Prepare Samples Analytical Method Submit Samples Sample Prep Data More automated reporting, Powerful searching Control Inst. Acquire Data Process Data Analyze Data Report & Share Search & Reuse Data Control Inst. Acquire Data Process Data Analyze Data Reports Results Store, Archive Data Instrument Instruction Instrument Data Processed Data Analyzed Data Reported Results Stored Data Standard data file format for data & metadata Output from one system becomes the input to the next The APIs enable the use of one vendor agnostic file format © 2015 Allotrope Foundation 21
Allotrope Data Format Data Description Request Data Cubes Sample prep Method Chromatogram: 2 D HDF Chromatogram 2 D HDF Instrument instruct. Data acquisition Chromatogram: 3 D HDF Chromatogram 2 D HDF Data Package(s) (optional) © 2015 Allotrope Foundation 22
• MOTIVATION • THEORY • REDUCING IT TO PRACTICE © 2015 Allotrope Foundation 23
Reducing it to practice Integration projects: • Early adoption and acceleration of AF Framework development through delivery of Framework components. • Accelerate vendor adoption via collaboration on delivering specific capabilities through the integration projects • Ensure the validity of the Framework through real-world requirements and implementation. • Benefit the sponsoring companies in terms of shared development of new functionality & shared experience • Drive integration of the Framework into the IT/IS Roadmap © 2015 Allotrope Foundation 24
Current Project Portfolio Research Amgen Baxter Development p. H, Weighing, GC, Karl Fischer, TGA, NMR , Cell Density/Viability, Blood Gas Analyzer , Cell Culture Analyzer, Capillary Electrophoresis … Multiple types Bioanalyzer Boehringer. Ingelheim HPLC-MS HPLC-UV/MS Genentech HPLC-UV Elemental Impurities Fermentation Process Control Structure ID, Purification, Lead Profiling Assay, Purity HPLC-UV Balance GSK Merck & Co Small and Large Molecule CMC ICP-MS Bayer BMS Commercial HPLC-UV/MS Member © 2015 Allotrope Foundation Drug Substance Release & Stability Method Screening Instrument non-GMP 25
APN Member Po. Cs AF Taxonomies APN Member “ELN” ADF multiple techniques (results only) APN Member “SDMS” Amgen Project GMP Development AF Taxonomies LES Instruments p. H Weighing HPLC UV MS GC Karl Fischer ADF multiple techniques (results only) © 2015 Allotrope Foundation TGA NMR Cell Density/Viability Blood Gas Analyzer Cell Culture Analyzer Capillary Electrophoresis … Data Lake (Hadoop) 26
APN Member Po. Cs APN Member “ELN” APN Member “CDS” ADF APN Member “ELN” GSK Project GMP Development Allotrope Taxonomies Stability Testing Release Testing Methods SOPs ELN HPLC ADF Balance © 2015 Allotrope Foundation 27
The Framework will provide Business benefit Less Manual Document Preparation • Find data quickly/logically • Eliminate Copy/paste • No more Transcription/conversion • Source agnostic • Eliminate error due to manual text entry or transcription • Complete, consistent, accurate metadata • One data file format • One consistent vocabulary • Reduced cost & complexity to CROs, CMOs, partnerships Facilitate Regulatory Compliance Lower Data Management Costs • Interoperability • Future-proof against future data migrations • Reduced technical debt: no more maintenance of legacy systems • Improved archiving © 2015 Allotrope Foundation Seamless data exchange & with partners & CROs Improved Data Integrity • Improved instrument & software validation tracking • Reduced complexity in system documentation • Simpler to support questions/investigations Extracting Knowledge & Value from Data • Greatly enhance speed to answer/decision • Remove data silos • Create an ecosystem for innovation • Facilitates data mining & analytics What are your priorities? 28
Thank you! © 2015 Allotrope Foundation 29
- Slides: 29