A Corporate Data Repository Supporting Scientific Research in












































- Slides: 44
A Corporate Data Repository Supporting Scientific Research in the UK Eddy Grąbczewski e. grabczewski@rl. ac. uk Database Design Engineer Rutherford Appleton Laboratory 6 Dec 2005
6 Dec 2005 KM Workshop, SLE Brian Matthews Contents n n n CCLRC Corporate Data Muddle Corporate Data Repository (CDR) Requirements Corporate Data Model (CDM) Design CDM Architecture Temporal CDM 2
6 Dec 2005 KM Workshop, SLE Brian Matthews CCLRC 3
6 Dec 2005 KM Workshop, SLE Brian Matthews Who are the UK Research Councils? n n n n AHRC – arts & humanities BBSRC – biotechnology & biological sciences CCLRC – science EPSRC – engineering & physical sciences ESRC – economic & social MRC – medical NERC – natural environment PPARC – particle physics & astronomy They provide research funding and services 4
Council for the Central Laboratory of the Research Councils 6 Dec 2005 KM Workshop, SLE Brian Matthews (1995) 5
6 Dec 2005 KM Workshop, SLE Brian Matthews Central Laboratory 6
6 Dec 2005 KM Workshop, SLE Brian Matthews Chilbolton Observatory n n This site in Hampshire opened in 1967 Chilbolton Dish - 25 metre radar antenna: n n n CAMRa Rain Radar ACROBAT Clear-air Radar Galileo Cloud Radar Copernicus Doppler Cloud Radar Raman Lidar – measures atmospheric properties Various meteorological sensors and several other services 7
6 Dec 2005 KM Workshop, SLE Brian Matthews Daresbury Laboratory n n This site in Cheshire opened in 1962 Synchrotron Radiation Source (SRS): n n n HPCx service of 1280 processors: n n n Creates intense X-ray light Service closes in 2008 40 IBM p. Series 690 Regatta nodes 32 POWER 4 processors per node e. Science and several other services 8
6 Dec 2005 KM Workshop, SLE Brian Matthews Rutherford Appleton Laboratory (DLS) n n n Site opened in 2005 Synchrotron service starts in 2007 Creates intense X-ray & UV light This site in Oxfordshire opened in 1957 Atlas Computing Laboratory (since 1975) Central Laser Facility (CLF) – high power lasers ISIS – pulsed neutron and muon sources RAL Ground Station Antenna – 12 metre radar antenna Space test facility – imitates the conditions of space Starlink – processes and analyses astronomical data e. Science and several other services 9
6 Dec 2005 KM Workshop, SLE Brian Matthews Corporate Data Muddle 10
6 Dec 2005 KM Workshop, SLE Brian Matthews The Corporate Data Muddle Symptoms of the data muddle: n n n Cannot use a single security pass across sites Laptops need to be configured for each site Cannot book meeting rooms across sites Cannot book taxis across sites Different library cards for each site In short, you feel like a persona non grata Dangers of the data muddle: n Sharing data across sites can result in multiple records for the same person 11
6 Dec 2005 KM Workshop, SLE Brian Matthews The Corporate Data Muddle Consequences of the data muddle: n n n n Different database vendors Different data models Different application technologies Different services at each site Incomplete data held for each site Develop similar applications for each site Entrenched development philosophies per site Causes of the data muddle: n n Site history Corporate re-organisations 12
6 Dec 2005 KM Workshop, SLE Brian Matthews The Corporate Data Muddle ? 13
6 Dec 2005 KM Workshop, SLE Brian Matthews The Corporate Data Muddle Some architectural solutions: n A single corporate database for all applications • Very complex and expensive to develop. n A limited corporate database feeding all departmental databases • Probably the best compromise. n Corporate applications that work transparently across sites • Web based applications using open standards. 14
6 Dec 2005 KM Workshop, SLE Brian Matthews Corporate Data Repository (CDR) Requirements 15
6 Dec 2005 KM Workshop, SLE Brian Matthews The Corporate Data Repository ? 16
6 Dec 2005 KM Workshop, SLE Brian Matthews The Corporate Data Repository 17
6 Dec 2005 KM Workshop, SLE Brian Matthews The Corporate Data Repository n n Holds common corporate data Includes information about: n Resources: persons, infrastructure n Organisation Units: organisations, departments n Roles n Authorisations May include information about: n Projects n Publications n Bookings But not data on: n Company Finances n Personal private information 18
6 Dec 2005 KM Workshop, SLE Brian Matthews The Corporate Data Repository n n Sometimes a repository is defined to be a metadata layer above the database layer. We define a repository to mean a global database that supports local databases. Analogous to global and local variables in a computer program. The CDR hosts the Corporate Data Model (CDM). 19
6 Dec 2005 KM Workshop, SLE Brian Matthews Corporate Data Model (CDM) Design 20
6 Dec 2005 KM Workshop, SLE Brian Matthews Corporate Data Model (CDM) n Currently the CDM models: n People related to CCLRC and DLS: • roles • authorisations n n Organisation Units in CCLRC and DLS Projects Consolidates data from the existing site databases at RAL and DL Based on the CERIF 2004 standard: n http: //www. eurocris. org/en/taskgroups/cerif/ 21
6 Dec 2005 KM Workshop, SLE Brian Matthews CERIF 2004 Common European Research Information Format (CERIF) 22
6 Dec 2005 KM Workshop, SLE Brian Matthews CDR 2005 Corporate Data Repository (CDR) 23
6 Dec 2005 KM Workshop, SLE Brian Matthews CDM Design Decisions n n n n n Based on the CERIF 2004 standard Based on ANSI/SPARC Architecture Supports temporal data at the row level Supports data distribution Supports globalisation: Unicode and time zones Supports inheritance for entities and relationships Supports complex relationships Supports recursive relationships Accessible through open standards interfaces: JDBC, ODBC Implementable on more than one database: Oracle, Ingres, Microsoft SQL Server 24
6 Dec 2005 KM Workshop, SLE Brian Matthews CDM Entities and Objects n The Interim Report (1975) of the ANSI/X 3/SPARC Study Group on DBMSs stated the following: “An object in the real world is called an entity” n A paper by Hall, Owlett and Todd (1976) discusses the concept of a surrogate: “… every entity of the outside world is associated with a surrogate which stands for that object in the model” n A surrogate is an object in the model that is in one-to-one correspondence with an entity in the world 25
6 Dec 2005 KM Workshop, SLE Brian Matthews CDM Entity and Object Identifiers Entity Identifier • an identifier used to identify the unique existence of an entity in the real world. • a surrogate. Object Identifier • an identifier used to identify the unique existence of an object in the database. • a global primary key. NOTE An Entity Identifier may have several Object Identifiers. 26
6 Dec 2005 KM Workshop, SLE Brian Matthews CDM Entity and Object Domains Each Identifier belongs to a corresponding Domain: Entity Domain the entity domain to which an Entity Identifier belongs. Object Domain the object domain to which an Object Identifier belongs. 27
6 Dec 2005 KM Workshop, SLE Brian Matthews Central Laboratory Domains Ent. Dom. Id 6000000 Obj. Dom. Id 6000000 Ent. Dom. Id 9000000 Obj. Dom. Id 9000000 Ent. Dom. Id 12000000 Obj. Dom. Id 12000000 28
6 Dec 2005 KM Workshop, SLE Brian Matthews CDM Architecture 29
6 Dec 2005 KM Workshop, SLE Brian Matthews ANSI/X 3/SPARC Architecture The Interim Report (1975) of the ANSI/X 3/SPARC Study Group on DBMSs also proposed three “realms of interest” in a database architecture: n External realm: n Conceptual realm: a simplified model of the real world as seen by one or more applications. a limited model of the real world for all applications. n Internal realm: data in computer storage representing the conceptual realm. 30
6 Dec 2005 KM Workshop, SLE Brian Matthews ANSI/X 3/SPARC Architecture 31
6 Dec 2005 KM Workshop, SLE Brian Matthews CDM Architecture 32
6 Dec 2005 KM Workshop, SLE Brian Matthews CDM Conceptual Model Sub-levels 33
6 Dec 2005 KM Workshop, SLE Brian Matthews CDM Conceptual Model 34
6 Dec 2005 KM Workshop, SLE Brian Matthews CDM Conceptual Model 35
6 Dec 2005 KM Workshop, SLE Brian Matthews CDM Conceptual Logical Model 36
6 Dec 2005 KM Workshop, SLE Brian Matthews CDM Conceptual Logical Model 37
6 Dec 2005 KM Workshop, SLE Brian Matthews CDM ANSI/SPARC Architecture The ANSI/SPARC layers provide a degree of data model independence: n The External layer is relatively independent of the Conceptual layer. n The Conceptual layer is relatively independent of the Conceptual Logical layer. 38
6 Dec 2005 KM Workshop, SLE Brian Matthews CDM Implementation CREATE TRIGGER … INSTEAD OF Supported By: • Oracle • Microsoft SQL Server • IBM DB 2 • IBM Informix • CA Ingres (CREATE RULE … DO INSTEAD) • Postgre. SQL (CREATE RULE … DO INSTEAD) 39
6 Dec 2005 KM Workshop, SLE Brian Matthews Temporal CDM 40
6 Dec 2005 KM Workshop, SLE Brian Matthews Temporal CDM Many databases only store current facts. What if we want to store and query historical facts? For example: n What was the organisation hierarchy on 1 April 1995? n What was an employee’s surname before her most recent marriage? n Which manager had departmental signing powers on 24 November 2004? n What position did this senior researcher hold at CCLRC when he wrote his 1995 journal paper? 41
6 Dec 2005 KM Workshop, SLE Brian Matthews CDM Entity Time Valid Time - the period over which a fact was believed to be true in the real world. Our beliefs may change - so may valid time. Example: King Canute lived from 995 to 1035. 42
6 Dec 2005 KM Workshop, SLE Brian Matthews CDM Object Time Transaction Time - the period over which a fact was recorded to exist in the database. Our records do not change – neither does transaction time. Example: The fact “King Canute lived from 995 to 1035” was recorded to exist in the database from 7 Mar 1995 until now. 43
6 Dec 2005 KM Workshop, SLE Brian Matthews Conclusion n n n CCLRC Corporate Data Muddle Corporate Data Repository (CDR) Requirements Corporate Data Model (CDM) Design CDM Architecture Temporal CDM 44