Interactive Data Analysis on the Grid with JAS

  • Slides: 12
Download presentation
Interactive Data Analysis on the Grid with JAS and Globus David Alexander, Brian Miller,

Interactive Data Analysis on the Grid with JAS and Globus David Alexander, Brian Miller, & John Exby Tech-X Corporation (www. techxhome. com) Boulder, Colorado Tony Johnson, Massimiliano Turri, & Booker Bense Stanford Linear Accelerator Center Menlo Park, California Supported by U. S. Department of Energy Small Business Innovative Research Grant DE-FG 03 -02 ER 83556 and Stanford Linear Accelerator Center Tech. XHome. com

Project Overview • Started with Java Analysis Studio (JAS) – Has distributed analysis system

Project Overview • Started with Java Analysis Studio (JAS) – Has distributed analysis system based on RMI • Set up test grids on Linux clusters – Used Globus Toolkit 2. 0 – Each node had GRAM & Grid. FTP servers and Java Runtime Environment • Wrote a JAS grid plug-in – Used Java Co. G Kit 0. 9 • Demonstrated at SC 2002 – Hit remote and on-site cluster Tech. XHome. com

Java Analysis Studio (JAS) jas. freehep. org • Open source application – Built for

Java Analysis Studio (JAS) jas. freehep. org • Open source application – Built for interactive data analysis, but flexible & modularized • Publication quality plotting facilities • User writes Java code to analyze data Tech. XHome. com

Java Analysis Studio (JAS) jas. freehep. org • Abstracted data source interface – Modules

Java Analysis Studio (JAS) jas. freehep. org • Abstracted data source interface – Modules are written to work with a variety of file formats (PAW, HIPPO, AIDA, Root, ODBC, flat files, SIO, HEP) • Distributed System Available • Versatile & Well used in high energy physics – Pure Java (Portable, Web Start installation & upgrade) – Flexible topology (stand-alone, client/server, cluster) – Integration w/ Ba. Bar, Geant 4, Wired Tech. XHome. com

Design Ideas & Added Features • Goal: clustered deployment, launch, & federation • Special

Design Ideas & Added Features • Goal: clustered deployment, launch, & federation • Special JAS Job use • Minimal prerequisites: –Bare grid: Globus, Java, nothing else –Heterogeneous cluster –Off-grid (or not) client, data, codebase –Clients don’t need to be superusers • Optional background deployment • Single sign on Tech. XHome. com

About Resource Discovery • Resource discovery – Software needs location of data files –

About Resource Discovery • Resource discovery – Software needs location of data files – Software needs location of Java-enabled hosts – Pluggable LDIF source (MDS, URL of text file) • Community Authorization Service – Fine-grained access control – Is resource discovery in a way

Move code to data with Grid. FTP • Location transparency –User sees data sets

Move code to data with Grid. FTP • Location transparency –User sees data sets –Could also have user choice • Automatic deployment of JAS –Multi-threaded task set –Verification of code version, Grid. FTP codebase to node if new –Grid. FTP/link data to user sandbox –Deploy control and catalog servers only on cluster head node –Worker nodes wait for catalog server to run Tech. XHome. com

Launch Application with Globus. Run • Automatic launch of Java servers –Java Data Servers

Launch Application with Globus. Run • Automatic launch of Java servers –Java Data Servers are run on specified JRE-enabled nodes • Special Grid Job is now started (exit the Wizard) • Code loaded into client or written in editor -compiled -automatically distributed to Java Data Servers -results (std out, std err, & histograms) sent back Tech. XHome. com

A few more Impressive Features • User can stop analysis, change code, & restart.

A few more Impressive Features • User can stop analysis, change code, & restart. • Distributed debugging can catch individual node failures. • Histogram re-bin slider surprisingly responsive Tech. XHome. com

Headaches and Issues • Versions of Globus vs. Java Co. G Kit • Co.

Headaches and Issues • Versions of Globus vs. Java Co. G Kit • Co. G properties configuration • Client & server clocks disagree • MS-Windows text line breaks • Abandoned jobs • Firewalls Tech. XHome. com

Future Ideas • Upgrade to Globus Toolkit 3 • Pre-install code on cluster head

Future Ideas • Upgrade to Globus Toolkit 3 • Pre-install code on cluster head or portal machine and deploy from there • Use more grid services (Condor, Replica) • Implement interfaces or service descriptions from PPDG CS-11 group. Tech. XHome. com

Further Information on JAS for the latest on JAS see the 3 pm Catogory

Further Information on JAS for the latest on JAS see the 3 pm Catogory 9 paper JAS 3 - A general purpose data analysis framework for HENP and beyond. CONTACTS David Alexander, alexanda@txcorp. com Brian Miller, bmiller@txcorp. com Tony Johnson, tony_johnson@SLAC. stanford. edu Massimiliano Turri, turri@SLAC. stanford. edu Java Analysis Studio, http: //jas. freehep. org