Distributed Grid Computing at ISIS using the Grid
Distributed Grid Computing at ISIS using the Grid MP System Tom Griffin, ISIS Facility & University of Manchester / UMIST
What do I mean by ‘Distributed Grid’? • A way of speeding up large, compute intensive tasks • Break large jobs into smaller chunks • Send these chunks out to (distributed) machines • Distributed machines do the work • Collate and merge the results
Spare Cycles Concept • Typical PC usage is about 10% • Most PCs not used at all after 5 pm • Even with ‘heavily used’ (Outlook, Word, IE) PCs, the CPU is still grossly underutilised • Everyone wants a fast PC! • Can we use (“steal? ”) their unused CPU cycles? • SETI@home, World Community Grid (www. worldcommunitygrid. org)
Possible Software Implementations • Toolkit e. g. COSM • Low level toolkit – source code level integration • So time consuming work, for each application • Entropia DC Grid • Trial run at ISIS two years ago. Some success • Company bought out and in limbo (? ) • United Devices Grid MP • What we’re currently using • Quite expensive • Condor • Free (academic research project) • In our experience 2 yrs ago, not reliable with Windows
The United Devices System • Server hardware • We use two, dual Xeon servers + 280 client licenses • Could (will) easily cope with more clients • Software • Servers run Red. Hat Linux Advanced Server / DB 2 • Clients available for Windows, Linux, SPARCs and Macs • Programming • MGSI – Web Services interface – XML, SOAP • Accessed with C++ and Java classes etc • Management Console • Web browser based • Can manage services, jobs, devices etc
Visual Introduction to the Grid
Installing and Deploying the System • Servers • Complete set up in under 3 hours • Virtually self maintaining • Clients • Windows only so far • MSI Installer • approx 20 seconds • SMS • MP Agent User • Install to other OSs looks straightforward
Suitable / Unsuitable Applications • • • CPU Intensive Low to moderate memory use Not too much file output Coarse grained Command line / batch driven Licensing issues?
Objects within the Grid • Program • Jobstep • Data Set • Data • Workunit • Client
How to write Grid Programs • Fairly easy to write • Interface to grid via Web Services • So far used: C++, Java, Perl, C# (any. Net language) 1) Think about how to split your data and merge results 2) Wrap and upload your executable 3) Write the application service • Pre and Post processing 4) Use the Grid
Wrapping Your Executable • Executable + any dlls etc • Standard data files • Compression • Encryption • Capture screen output • Set Environmental Variables • Command Line
Application Service • Pre-processing 1) Partition data 2) Package data partitions 3) Log in to the Grid server 4) Create a Job and Job Step 5) Create a Data Set 6) Create Datas and upload data packages 7) Create Workunits 8) Set the Job running • Post-Processing 1) Retrieve results 2) Merge results
Example Application: HMC Hybrid Monte Carlo method of global optimisation to solve molecular crystal structures from powder diffraction data Parametric problem • e. g. vary parameters such as acceptance ratio, to scan a 3 D grid • each run completely independent of any other • Send one run to each machine on the grid
Running HMC on the Grid • Unchanged exe • User edits or creates an appropriate settings file • User runs “my” HMC submit program • Splits bat file into one line per machine • Uploads chunks to the Grid server • Grid server distributes Workunits to clients • User monitors the job with their web browser • Clients return results to the Grid server • User runs HMC retrieve program • Downloads results
More on HMC Submit… • Split the batch file into lines • Create a dataset (to hold our data) • Package data (command line and zmatrix files etc) • Associate data with dataset • Upload data packages to Grid server • Create Workunits from the dataset • Create a Job to hold the Workunits
Yet more… • Program written in C++ • Uses C++ classes to ‘hide’ SOAP calls ds. HMC. data_set_gid = mgsi->create. Data. Set(ds. HMC); ud: : uuid Mgsi. Client: : create. Data. Set(const Data. Set &data_set) throw(Mgsi. Exception) { SOAPMethod request("create. Data. Set", "urn: //ud. com/mgsi"); request. Add. Parameter("authkey") << authkey; request. Add. Parameter("data_set") << data_set; const SOAPResponse &response = call(request, const_cast<SOAPParameter *>(&request. Get. Parameter((size_t)0))); ud: : uuid retval; response. Get. Return. Value() >> retval; return retval; } • Auto generated by ‘Axis C++’ from WSDL file • Also a C++ HTTPs file transfer program
Performance • Linear: 50 devices ≈ 50 times faster • Affected by size of Workunit – Overhead for distribution is ≈ 1 minute – Risk of device being switched off
Example 2: MD Manager • Molecular Dynamics simulation(s) • Program written in C# • Generated from WSDL (and modified) C# classes to hide SOAP • Wrote generic C# HTTP file transfer classes • ‘Interactive’ program • Typical runtime ~10 hours per single simulation • Need to investigate ‘grids’ of simulations
Temperature Pressure A B C D E F G H I • But in 3 -dimensions • and with ‘ordering restrictions’ • plus a post processing stage
Who Else Does This? • Johnson & Johnson • Novartis • GSK • National Physical Laboratory • Accelrys • IBM • World Community Grid • http: //www. worldcommunitygrid. org/ • Currently the Human Proteome Folding project
Problems Encountered & Support • Technical Problems • Mercifully few! • Main issue has been RAM thresholding (now resolved) • Encryption of certain files causes a problem • Support • So far been very good • Responses to queries always next day (time difference) and always insightful • Ease of setup / maintenance • Installed and fully running in ~3 hours • Next to no maintenance required, other than backup
‘Social’ Issues • Easiest thing to blame • Too abstract for some users (no big box) • Stealing my cycles • Expansion leads to political problems
Future Developments - Expansion • Expansion $50 k • Proposal accepted for an additional 400 licenses • Giving us a total of 480 • Change in licensing model • Bottom Line: Costs Completed $45 k Funded • Setup, server licenses, 80 client licenses + support – $18 k – CMSD Seeking funding • Total ≈ $250 k $83 k
Summary • Grid is here and running smoothly • Easy to use • Excellent performance • Vast amount of compute power available • Future looks good
- Slides: 25