Sim DB and Sim TAP Dealing with a
Sim. DB and Sim. TAP Dealing with a complex data model Gerard Lemson, Nara, 2010 -12 -10
Sim. DB and Sim. DAL Protocols to support • describing simulations – Simulation Data Model: Model for N-body 3+1 D any simulations http: //volute. googlecode. com/svn/trunk/projects/theory/snapdm/specification/uml/Sim. DB_DM. png • publishing simulations – Simulation Database (Sim. DB): protocol for accessing a database built according to Sim. DM. • finding simulations – Sim. DB/TAP – query. Data in Sim. DAL – Sim. TAP • retrieving simulation data, whole, in parts, manipulated – Sim. DAL get. Data services (not in this talk) • Btw: “simulation” can be – – simulation run simulation result simulation data post-processing of simulation results
Sim. DB/REST • “simple” access to Sim. DB • Uses XML representation of model – XML schema http: //code. google. com/p/volute/source/browse/#svn/trunk/projects/theory/snapdm/specification/xsd • Examples http: //code. google. com/p/volute/source/browse/#svn/trunk/projects/theory/snapdm/specification/examples – PDR http: //code. google. com/p/volute/source/browse/#svn/trunk/projects/theory/snapdm/specification/examples/external/PDR – Gadget 2 http: //volute. googlecode. com/svn-history/r 1382/trunk/projects/theory/snapdm/specification/examples/external/Gadget 2. xml – TODO more (SVO) • VO-URP – validator http: //www. g-vo. org/Sim. DB-browser/Validate. do – upload – download http: //www. g-vo. org/Sim. DB-browser
Sim. DB/TAP • Model complex – Too(? ) complex for trivial (parameter based) query language – Need special navigation tools (vo-urp@gavo) – Need powerful query language • Impement TAP on database built according to Sim. DM • Map UML to RDB model – TAP_SCHEMA for Sim. DM (vo-urp@gavo old) http: //code. google. com/p/volute/source/browse/#svn/trunk/projects/theory/snapdm/specification/tap – create table + inserts – VOData. Service • VO-URP SQL query http: //www. g-vo. org/Sim. DB-browser/Query. do • Not always easy!
Model complex • Normalised (see image) • General Abstract – e. g. parameters must be fully defined, no assumptions • Hard to deal with quantities with a priori unknown units – Parameter. Setting table has value AND unit attributes (Quantity datatype)
Example queries • Find synthetic spectra of white dwarf stars • Find cosmological simulations with Ω=0. 9, ΩΛ= 0. 7 and Ωb=0. 02 • Find all SPH simulations containing a galaxy cluster with mass around 1014 Msun
select from , , , where and and and e. * experiment e target. Object t result r product p t. label=‘white_dwarf’ t. containerid=e. id r. target. Id=t. id p. containerid=r. id p. product. Type=‘spectrum’
Example queries • Find synthetic spectra of white dwarf stars • Find (cosmological) simulations with Ω=0. 9, ΩΛ= 0. 7 and Ωb=0. 02 • Find all SPH simulations containing a galaxy cluster with mass around 1014 Msun
select from , , , where and and and e. * Experiment e Input. Parameter ip 1 Parameter. Setting ps 1 Input. Parameter ip 2 Parameter. Setting ps 2 Input. Parameter ip 3 Parameter. Setting ps 3 ps 1. container. Id = e. id ps 1. parameter. Id = ip 1. id ip 1. label = ‘omega_lambda’ ps 1. numerical. Value_value=0. 7 ps 2. container. Id = e. id ip 2. label = ‘omega_baryon’ ps 2. parameter. Id = ip 1. id ps 2. numerical. Value_value=0. 02 ps 3. container. Id = e. id ip 3. label = ‘omega’ ps 3. numerical. Value_value=0. 9
Example queries • Find synthetic spectra of white dwarf stars • Find (cosmological) simulations with Ω=0. 9, ΩΛ= 0. 7 and Ωb=0. 02 • Find all SPH simulations containing a galaxy cluster with mass around 1014 Msun
select e. * from Experiment e , Experiment. Representation. Object ero , Representation. Object. Type rot , Target. Object to , Property p , Statistical. Summary s where ero. container. Id = e. id and ero. type. Id= rot. id and rot. label=‘sph. particle’ and to. container. Id = e. id and to. label = ‘galaxy. cluster’ and p. container. Id = to. id and p. label=‘mass’ and s. property. Id = p. id and s. statistic = ‘value’ and s. numerical. Value_value=1 e 14 and s. numerical. Value_unit=‘M_sun’
An example from Paris. Find typical values of mass, x, y, z properties in a given simulation result SELECT , , , FROM , , , , , WHERE AND and and and and r. id as id r. publisherdid as publisherdid s 0. numeric. Value_value as mass s 1. numeric. Value_value as x s 2. numeric. Value_value as y s 3. numeric. Value_value as z result r product o statisticalsummary s 0 property p 0 statisticalsummary s 1 property p 1 statisticalsummary s 2 property p 2 statisticalsummary s 3 property p 3 r. containerid = 6 o. containerid = r. id s 0. containerid = o. id s 1. containerid = o. id s 2. containerid = o. id s 3. containerid = o. id p 0. publisherdid = 'mass' s 0. proprtyid=s 3. id s 0. statistic = ‘nominal’ p 1. publisherdid = 'x' s 1. proprtyid=s 3. id s 1. statistic = ‘nominal’ p 2. publisherdid = 'y' s 2. proprtyid=s 3. id s 2. statistic = ‘nominal’ p 3. publisherdid = 'z' s 3. proprtyid=s 3. id s 3. statistic = ‘nominal’
SELECT r. id as id , r. publisherdid , max(case when p. publisherdid s. statistic=‘nominal’ then s. numeric. Value_value FROM result r , product o , statisticalsummary s , property p WHERE r. containerid = 6 AND o. containerid = r. id and s. containerid = o. id and p. id = s. propertyid group by r. id, r. publisherid, o. id = ‘mass’ and else null end) as mass = ‘x’ and else null end) as x = ‘y’ and else null end) as y = ‘z’ and else null end) as z
Conclusions • Some queries can be phrased nicely • Others using standard SQL, but due to level of normalisation and abstraction MANY joins required • Can we simplify this a bit?
zoom
Parameter. Setting container. Id value unit parameter. Id . . . 123 0. 02 456 123 0. 7 457 123 0. 9 458 345 . 04 456 345 . 7 457 345 1 458 . . . + Input. Parameter id name label datatype description 456 omega_b omega. baryon real . . . 457 omega_l omega. lambda real . . . 458 omega real . . . . simtap. Experiment id omega_b omega_l omega 123 0. 02 0. 7 0. 9 345 0. 04 0. 7 1 . . .
Sim. TAP • When Protocol is fixed, tap schema can be simplified – parameters columns in simtap. Experiment table – property characterisation columns in product specific characterisation table(s) –. . .
Instead of this select from , , , where and and and e. * Experiment e Input. Parameter ip 1 Parameter. Setting ps 1 Input. Parameter ip 2 Parameter. Setting ps 2 Input. Parameter ip 3 Parameter. Setting ps 3 ps 1. container. Id = e. id ps 1. parameter. Id = ip 1. id ip 1. label = ‘omega_lambda’ ps 1. numerical. Value_value=0. 7 ps 2. container. Id = e. id ip 2. label = ‘omega_baryon’ ps 2. parameter. Id = ip 1. id ps 2. numerical. Value_value=0. 02 ps 3. container. Id = e. id ip 3. label = ‘omega’ ps 3. numerical. Value_value=0. 9
this select from where and e. * simtap. Experiment omega. Lambda=0. 7 omega. Baryon=0. 02 omega=0. 9
Table definitions can be derived • From a Protocol definition – input parameters – for each Representation object type • a table with statistical summaries of properties – target object type • ala Sim. DM (units in ADQL required) • pivoted per project? – input data sets (urls) • Pivoting queries can be generated
Proposal • Sim. DAL services MAY include a Sim. TAP service • 1 Sim. TAP schema per Protocol • Each such schema contains – 1 Experiment table with columns for parameters – >=1 Product tables with characterisation of properties – Possibly other tables from Sim. DB/TAP
- Slides: 21