The ARCS Data Analysis Software Michael Aivazis California























- Slides: 23
The ARCS Data Analysis Software Michael Aivazis California Institute of Technology
Fractals in software • “Drip programming” – may generate aesthetically interesting flow charts – but it is not a desirable practice Pollock’s “Autumn Rhythm” • Advanced technology may actually complicate matters – – – – – complex data structures objects user interfaces multiple platforms distributed computing high performance computing security … the Grid … or Michael’s framework?
Software Roadmap
Data reductions • • Account for incident flux Remove background Convert from time to energy Correct for detector efficiency Bin into rings of constant scattering angle Convert from angle to momentum Subtract multi-phonon and multiple scattering Correct for absorption Python C++
From TOF to energy filename Read HDF file raw counts times Spect. Info data errors 2 times Subtract Spect. Info background e_min e_max e_i num_e t_min t_max counts in energy Rebin errors 2 energies Sq. rt errs errors Write HDF file Rebin
Data flow for TOF to Energy conversion
Design directions • Integrate analysis modules using scripting – Python • Data flow paradigm – Well understood – Easy to implement and document • Meta-data in XML – fully reproducible description of the data analysis pipeline – tag and archive data – record the version number of each module used in the analysis • Enable distributed computing – XMLRPC, SOAP, … • File formats: Ne. Xus + XML meta-data – Reuse, reuse – Augment, contribute – HDF 5!
Flexibility through the use of scripting • Scripting enables us to – Organize the large number of parameters – Allow the analysis environment to discover new capabilities without the need for recompilation or relinking • The python interpreter – The interpreter • modern object oriented language • robust, portable, mature, well supported, well documented • easily extensible • rapid application development – Support for parallel programming • trivial embedding of the interpreter in an MPI compliant manner • a python interpreter on each compute node • MPI is fully integrated: bindings + OO layer – No measurable impact on either performance or scalability
Writing python bindings • Given a “low level” routine, such as double arcs: : add(double a, double b); • and a wrapper Py. Object * arcs_add(Py. Object *, Py. Object * args) { double a, b; int ok = Py. Arg_Parse. Tuple(args, “dd”, &a, &b); if (!ok) { return 0; } double result = arcs: : add(a, b ); return Py_Build. Value(“d”, result); } • one can place the result of the routine in a python variable c = arcs. add(2, 2) • The general case is not much more complicated than this
Pyre Architecture • The integration framework is a set of co -operating abstract services python package engine component bindings engine framework service engine infrastructure component abstract class bindings library specialization FORTRAN/C/C++ bindings engine
Pyre services • journal – flexible control over the generation and delivery of simulation diagnostics from the compute nodes to the workstation • monitor – a distributed service for low bandwidth, on the fly visualizations – currently used mostly for status monitoring and debugging • timer • weaver – a general source code generation facility – support for many languages • FORTRAN, C, C++, python, HTML, XML • from makefiles to optimized C++ sources – automatic web page creation for cgi scripts – supports user authentication • passwords, soon user SSL certificates • blade – a toolkit independent UI generator
Distributed services Workstation Services Compute nodes analysis monitor component 1 component 2 journal
IRIS Explorer
Visual Programming Environment • Data flow paradigm appears natural – usability problems are focused on knowledge of what is possible – used by many commercial and open source tools • Improvements – decouple UI from diagram logic – interface • use Open. GL! • collaborative • interesting and relevant research – diagram logic • thin, reusable component • scripting • multi-layered control – development can use existing solutions as a guide of what not to do – many modules already available in pyre – enable distributed programming • Target for prototype: early 2004
XMLRPC: Enabling distributed computing • An open standard for remote procedure calls • Allows us to perform the computation – where the data lives – independently of the local computing capacity Database Server • Security is an issue Client Remote Server Beowulf Cluster
Prototype User Interface • Application capabilities – depend on the remote server – exported to the client • Boxes represent – data sources – computational modules • Wires represent – data flows – control • Boxes have input and output ports where wires can be attached
Data Analysis Execution • User hits “Run” • Applet interprets wiring diagram as XMLRPC commands • Server receives commands, arranges Python script, and data processing commences.
User interface prototypes - I
User interface prototypes - II
User interface prototypes - III
MATLAB • • • If you must… Fully accessible from Python Support involves converting result of data analysis into MATLAB native arrays
Software engineering practices • Version control – Provides a record of the evolution of the software – CVS: well supported, open source • Configuration management – – Uniform, portable build procedure Automatic, regular builds of the entire software base config: a system based on make merlin: a python-based replacement under development • Regression testing – Test cases that • Exercise expected behavior • Exercise fixes for known bugs • Bug tracking – Organize the “to do” list, the feature requests … and the known defects – Gnats: well supported, open source
Design directions • Integrate analysis modules using scripting – Python • Data flow paradigm – Well understood – Easy to implement and document • Meta-data in XML – fully reproducible description of the data analysis pipeline – tag and archive data – record the version number of each module used in the analysis • Enable distributed computing – XMLRPC, SOAP, … • File formats: Ne. Xus + XML meta-data – Reuse, reuse – Augment, contribute – HDF 5!