Whats new in Condor Condor Week 2006 Todd
- Slides: 58
What’s new in Condor? Condor Week 2006 Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison condor-admin@cs. wisc. edu http: //www. cs. wisc. edu/condor
So Todd… where is v 6. 8? Well, v 6. 7 has been a challenge… 2
3
4
Around since the 80’s 5
Around since the 80’s Mullet Boy 6
100 people surveyed! Favorite “ility” ? 7
100 people surveyed! Favorite “ility” ? Deployability! 8
Existing Ports • Digital UNIX 4. 0 Alpha • AIX 5. 2 (clipped) Power. PC • Tru 64 5. 1 (clipped) Alpha • HP UNIX 10. 20 PA RISC • HP UNIX 11. 00 (clipped using hpux 10. 20 32 bit) PA RISC • Irix 6. 5 (clipped) SGI • Linux 2. 4. x (glibc 2. 2) - Red Hat 7. 1, 7. 2, 7. 3 (clipped) Alpha • Linux 2. 4. x (glibc 2. 2) - Red Hat 7. 1, 7. 2, 7. 3 Intel x 86 • Linux 2. 4. x (glibc 2. 2) - Red Hat 8 Intel x 86 • Linux 2. 4. x (glibc 2. 3) - Red Hat 9 Intel x 86 • Enterprise Server 8. 1 Intel Itanium • Solaris 8 Sparc • Solaris 9 Sparc • Microsoft Windows 2000 or XP (clipped) Intel x 86 9
› New Ports Introduced in v 6. 6. x h h h Mac. OSX (“clipped") Power. PC Sigh… Debian Linux 3. 1 Intel x 86 Fedora Core 1 Intel x 86 Red Hat Enterprise Linux 3 Intel x 86 Su. SE Linux Enterprise Server 8. 1 Intel Itanium › Introduced in v 6. 7. x h h h AIX 5. 1 (“clipped") Power. PC Fedora Core 2 on x 86 Fedora Core 3 on x 86 Su. SE 8. 0 ("clipped") on AMD 64 Solaris 10 ("clipped") on Sparc Scientific Linux (Release 303) on x 86 “Psilord” – The Condor porting doctor. Talk to him in person tomorrow. › Still to be introduced in v 6. 7. x (before v 6. 8. 0) h HPUX 11 i 64 -bit pa-risc h RHEL 4 on x 86 h “native” 64 bit AMD Linux 10
Porting Table › See http: //www. cs. wisc. edu/condor/porting/port_table. html › Highlights h Almost every 32 -bit Linux flavor as “full” h Every other Unix, Mac. OS and Windows available as “clipped” h Solaris 10 and HP-UX 11. x now “clipped” h Free. BSD 4 contribution from Yahoo!, added 5 and 6 h X 86_64 Linux: “full” running in the lab 11
Backfill Jobs › Execute machines will run a locally › staged executable when otherwise idle. Currently designed for BOINC. # Turn on backfill functionality, and use BOINC ENABLE_BACKFILL = TRUE BACKFILL_SYSTEM = BOINC # Spawn a backfill job if we've been Unclaimed for more than 5 minutes START_BACKFILL = $(State. Timer) > (5 * $(MINUTE)) # Evict a backfill job if the machine is busy (based on keyboard # activity or cpu load) EVICT_BACKFILL = $(Machine. Busy) 12
Joining Condor’s Einstein@Home Compute Team › If you’re running BOINC backfill jobs in › Condor and want to use your cycles to help another UW project, please join the Einstein@Home computation Join the “Condor Backfill” team: hhttp: //einstein. phys. uwm. edu/team_display. p hp? teamid=5994 hhttp: //einstein. phys. uwm. edu/create_accoun t_form. php? teamid=5994 13
More “deployability” › “Personal” Condor Support on Win 32 h. Local. System not required › MSI installer on Win 32 (thanks Micron!) › New tools Safe, dynamic Condor service deployment. More info @ Research BOF 9 am Rm 219 hcondor_cold_start and hcondor_cold_stop 14
100 people surveyed! Favorite “ility” ? 15
100 people surveyed! Favorite “ility” ? Availability! 16
Condor with Firewalls and NATS: GCB in v 6. 8. 0! GCB layer connect translate Client app TCP/IP listen accept Server app GCB layer TCP/IP Relay point 17
Job Progress continues if connection is interrupted › Now for Vanilla, Java, and Grid universe jobs, Condor supports reestablishment of the connection between the submitting and executing machines. h If network outage between execute and submit machine h If submit machine restarts h Grid Universe was tricky… › To take advantage of this feature, put the following line into their job’s submit description file: Job. Lease. Duration = <N seconds> For example: job_lease_duration = 1200 18
Job Progress continues if submit machine fails › Condor can now support a submit machine “hot spare” (schedd failover) h. If your submit machine A is down for longer than N minutes, a second machine B can take over h. Requires shared filesystem between machines A and B 19
Central Manager Failover › Condor Central Manager has two services › condor_collector h. Now a list of collectors is supported › condor_negotiator (matchmaker) h. If fails, election process, another takes over h. Accounting state is peridocially replicated h. Contributed technology from Technion 20
Reliability, cont. › Time shifts › Quill › Closing windows of vulnerability 21
100 people surveyed! Favorite “ility” ? 22
100 people surveyed! Favorite “ility” ? Lighweight? 23
100 people surveyed! Favorite “ility” ? Lighweight? 24
100 people surveyed! Favorite “ility” ? 25
100 people surveyed! Favorite “ility” ? Functionality! 26
Security › Common Authentication Methods between Condor on Unix and Win 32 h. Kerberos 1. 4 • Additional hopeful benefit: Authentication against MS Active Directory! h. SSL h. Password (shared secret) › Starter only runs known executables › More powerful, unified map file(s) › GSI credentials delegated 27
With Condor on Win 32, it be nice if … › My jobs could access my files just like the › › condor_shadow can I didn’t have to tie my execute machines to a single account I didn’t have to run condor_store_cred from every machine where my credential is needed (thank you Optena) 28
The Windows Cred. D › A centralized repository for user passwords C: >condor_store_cred add Account: gquinn@CROW Enter password: myp 4 sswd “store password” <password> y 0 urs credd Operation succeeded. 29
The Windows Cred. D schedd “fetch password” myp 4 sswd y 0 urs <password> shadow Submit machines can use the Cred. D to impersonate the user in the shadow 30
The Windows Cred. D starter “fetch password” <password> condor_exec. exe myp 4 sswd y 0 urs Execute machines can use the Cred. D to run jobs as the submitting user! 31
Running Jobs as Submitting User › In submit file: h. Run_job_as_owner = true › In config file on submit and execute nodes: CREDD_HOST = vault. cs. wisc. edu STARTER_ALLOW_RUNAS_OWNER = True CREDD_CACHE_LOCALLY = True 32
Some Condor APIs › Command Line tools › › › › h condor_submit, condor_q, etc h -format, -constraint, -xml Condor Perl Module Chirp Checkpoint Library API MW --- improved! DRMAA (Works w/ Win 32, on Source. Forge) Condor Grid ASCII Protocol (GAHP) Web Service Interface 33
DRMAA › Distributed Resource Management Application API (DRMAA) h. GGF Working Group h. An API specification for the submission and control of jobs to one or more Distributed Resource Management (DRM) systems › An API with C and Java bindings hnot a protocol › Scope h. Does: job submission, monitoring, control, final status h. Does not: file staging, reservations, security, … 34
Condor GAHP › The Condor GAHP is a relatively low-level protocol › based on simple ASCII messages through stdin and stdout Supports a rich feature set including two-phase commits, transactions, and optional asynchronous notification of events 35
GAHP, cont Example: R: $Gahp. Version: 1. 0. 0 Nov 26 2001 NCSA Co. G Gahpd $ S: GRAM_PING 100 vulture. cs. wisc. edu/fork R: E S: RESULTS R: E S: COMMANDS R: S COMMANDS GRAM_JOB_CANCEL GRAM_JOB_REQUEST GRAM_JOB_SIGNAL GRAM_JOB_STATUS GRAM_PING INITIALIZE_FROM_FILE QUIT RESULTS VERSION S: VERSION R: S $Gahp. Version: 1. 0. 0 Nov 26 2001 NCSA Co. G Gahpd $ S: INITIALIZE_FROM_FILE /tmp/grid_proxy_554523. txt R: S S: GRAM_PING 100 vulture. cs. wisc. edu/fork R: S S: RESULTS R: S 0 S: RESULTS R: S 1 R: 100 0 S: QUIT R: S 36
Web Service Interfaces › SOAP over http or https to › › the Condor daemons Use any language or platform (where you can find a decent SOAP library) Functionality Exposed in current release h Submit jobs h Retrieve job output h Remove/hold/release jobs h Query machine status (fetch ads from collector) h Query job status (fetch ads from the schedd) 37
Getting machine status via SOAP (in Java with Axis) locator = new Condor. Collector. Locator(); collector = locator. getcondor. Collector(new URL(“http: //machine: port”)); ads = collector. query. Startd. Ads(“Memory>512“); Because we give you WSDL information you don’t have to write any of these functions. 38
More Functionality changes. . › FINALLY, clean/consistent cross-platform quoting › rules for arguments and environment variables (see condor_submit man page) Schedd can run Hawk. Eye modules, just like the Startd h Enables monitoring on the submit machine › condor_history : now faster than a snail, and › cleans up droppings. Deferral. Time, Deferral. Window h Coordinated starts › BIND_ALL_INTERFACES in config file › WANT_REMOTE_IO in job Class. Ad 39
Class. Ad Functions in Condor! › Conditionals h. If. Then. Else(condition, then, else) › String functions h. Strcat(), strcmp(), to. Upper(), etc. › String. List functions h. Example of a “string list” (CSV style) • Mylist = “Joe, Jon, Jeff, Jim, Jake” h. Str. List. Contains(), Str. List. Append(), Str. List. Remove(), etc. › Others h. Regular expressions, arithmetic, etc… 40
Accounting Groups and Group Quota Support › Account Group (w/ CORE Feature Animation) › Account Group Quota (inspiration CDF @ Fermi) h Sample Problem: Cluster w/ 500 nodes, Chemistry Dept purchased 100 of them, Chemistry users must always be able to use them h Could use Machine Rank… • but this ties to specific machines h Or • • could use new group support Each group can be given a quota in config file Job ads can specify group membership Group quotas are satisfied first Accounting by user and by group 41
100 people surveyed! Favorite “ility” ? 42
100 people surveyed! Favorite “ility” ? Universability! 43
Grid Universe › With new Grid Universe, always specify a ‘gridtype’. › So the old “globus” Universe is now declared as: universe = gridtype = gt 2 Other gridtypes? h GT 2 (Globus Toolkit 2) h GT 3 (Globus Toolkit 3. 2) h GT 4 (Globus Toolkit 3. 9. 5+) ‘Condor-G’ h UNICORE h Nordugrid h PBS (Open. PBS, PBSPro – technology from INFN) h LSF (Platform LSF – technology from INFN) h CONDOR (thanks g. Lite!) ‘Condor-C’ 44
Other Grid Universe improvements › Condor-G has support for credential refresh via the › My. Proxy Online Credential Management in NMI http: //grid. ncsa. uiuc. edu/myproxy (both GT 2 and GT 4) GT 4 : we start a Grid. FTP server behind the scenes h Grid. FTP server bundled w/ Condor nowadays › Some functionality present in Condor-G added to Condor-C h Forwarding of refreshed credentials (EGEE) h GSI authentication support h Cleaner Class. Ad representation (URL) 45
Parallel Universe › Replaces the “MPI” universe › Allows running arbitrary programs that need to gang-schedule multiple machines h. MPICH, LAM, … h. FT-MPICH (Seoul National Univ) h. Great for testing environments 46
Hey Jobs! We’re watching you! › Local Universe h. Just like Scheduler Universe, but there is a condor_starter h. All advantages of the starter Submit schedd starter job Hey, job, behave or else! Execute startd starter job 47
100 people surveyed! Favorite “ility” ? 48
100 people surveyed! Favorite “ility” ? Scalability! 49
Faster Negotiation › SIGNIFICANT_ATTRIBUTES determined automatically h. Job attributes Auto. Cluster. Id and Auto. Cluster. Attributes h. Rounding of Attributes › Schedd uses non-blocking TCP connects to the › › › startd Negotiator caching Collector Forks for queries More coming… 50
› Knobs Scalability, cont. h GRIDMANAGER_MAX_JOBMANAGERS_PER_RESOURCE, h GRIDMANAGER_MAX_PENDING_SUBMIT_PER_RESOURCE, h GRIDMANAGER_MAX_SUBMITTED_JOBS_PER_RESOURCE › One instance of gridmanager handles › multiple jobs (all from a given user) One instance of condor_dagman can run multiple dags h. Is the Shadow next? › Buffered I/O read on schedd restart (thanks Yahoo!) 51
Quill › Job Class. Ads Master Startd …Schedd Job Queue log Quill RDBMS Queue + History Tables › › information mirrored into an RDBMS Both active jobs and historical jobs Benefits BOTH scalability and accessibility 52
Version 6. 9. x 53
What’s brewing for after v 6. 8. 0? › More data, data h. Stork distributed now v 6. 7. x, incl DAGMan support – next it is Ne. ST’s turn. h. Ne. ST manage Condor spool files, ckpt servers • Grid. FTP used to move the bits h. Quill++ and Condor. DB goodness › Virtual Machines (and the future of Standard Universe) h. Research BOF w/ Jaeyoung Moon, rm 219 9 am 54
SOAP API › First focus will be to finish interfaces used by all command-line tools hcondor_userprio, condor_cod, … › Explore message-based security h. Ian Alderman’s work w/ signed Class. Ad attributes 55
Privilege Separation › No more root in the Condor daemons! › Instead, a small component will be › › responsible for privileged operations Initial exploratory work w/ GNU userv (Cambridge) Now focusing on integration w/ glexec (g. Lite / nikhef) 56
“The Year of the Schedd” › Schedd is juggling to many tasks h. Break it down into smaller pieces, more modular › Scalability h. All non-blocking I/O h. Hierarchy of schedds › Schedd-on-the-side h“Scheduler booster” h. Transform & delegate job classads to different grids h. A “job router” for a grid 57
Thank you! 58
- Ccondor
- Whats a condor
- Week by week plans for documenting children's development
- Growing pains for the new nation
- Emblemas patrios de colombia
- Fernanda valle desnuda
- The condor experience
- Condor aero club
- Andean condor speed
- Bagne de poulo condor
- Apis daten condor
- Condor job flavour
- Airbus lms
- Condor cluster
- Translate el condor pasa
- Condor distributed computing
- Condor de1668
- Condor scheduler
- Condor homepage
- Condor grid
- Condor atm
- Purpose of discourse analysis
- Condor software
- Condor grid
- Condor v barron knights
- Snyder introduction to the california condor download
- Whats hot whats not
- Todd franzen
- Todd strasser the wave
- How many squares do you see
- Todd conklin quotes
- Todd klindworth
- Dr todd baron
- Paced decision model
- Ophthalmologica
- Todd tannenbaum
- Todd arbogast
- Fda oimt
- Todd toriello
- Todd jackson hockey canada
- Richard watson todd
- La storia di amanda
- London joint stock bank v macmillan and arthur
- Mappa di todd
- Decomposition
- Todd tannenbaum
- Emilie todd helm
- Todd is trying to quit cheating
- Dr todd borus md
- Petra todd model
- Sweeney todd storyline
- Colin todd fire
- Todd c. mowry
- Todd lanning
- Todd bacastow
- Dr todd fox
- Todd sosna
- David todd lee
- Todd humes