Whats new in Condor Whats coming up Condor
- Slides: 51
What’s new in Condor? What’s coming up? Condor Week 2008 Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison condor-admin@cs. wisc. edu http: //www. cs. wisc. edu/condor
Release Situation › Stable Series h. Current: Condor v 7. 0. 1 (Feb 27 th 2008) h. Last Year: Condor ver 6. 8. 4. (Feb 5 th 2007) › Development Series h. Current: Condor v 7. 1. 0 (April 1 st 2008) h. Last Year : Condor ver 6. 9. 2. (April 10 th 2007) › v 6. 9 Series : ~ 14 months 2
3
Special Condor Week Edition 4
5
How many cores in one new UW Condor cluster rack? 6
New Ports › RHEL 5 x 86 & x 86_64 with stduniv and › › › glibc 2. 5 Playstation 3 HPUX 11 i Itanium (almost done) Cross testing on x 86 -like platforms Debian clipped port Out with the old. h. Red Hat Linux 7. x systems on the x 86 processor. h. Digital Unix systems on the Alpha processor. h. Yellow Dog Linux 3. 0 systems on the PPC processor. h. Mac. OS 10. 3 systems on the PPC processor. 7
› › › Big v 7. 0 Goodies Scalability Improvements GCB Improvements Privilege Separation New Quill Virtual Machine Universe 8
Scalability 9
10
11
12
13
14
15
Condor’s Privilege Separation › Apply principle of least › › privilege to Condor No more root / superuser privilege required Currently completed on execute side Use glexec or Condor’s own “sudo” Can still run the “old way” if you want 16
Quill Take Two in v 7. x › Shared databases › More than just the Job. Ad, e. g. h. Startd: Machine Class. Ads h. Negotiator: matches h. Run: Job User Log information › More than just Postgre. SQL DBMS › All the details: http: //www. cs. wisc. edu/condor/quill_overview_07 -18 -2007. pdf 17
Start. D Sched. D DBMS Disk Negotiator Quill. D sql. log 18
Virtual Machine Universe › Submit a “Job” that consists of a virtual › › › machine image Condor schedules, manages, and monitors VM job Works w/ VMware Server and Xen Matchmaking Checkpoint/Restart/Migration Data Movement Plug: Bo. F Session 1: 30 pm tomorrow 19
What else? GCB Improvments! 20
21
22
› Improved Scalability: Only use the broker if required! h Local Host Optimizations • Bypass GCB if two daemons are talking on the same host h Local Network Optimizations • Two hosts on the same private net bypass the broker • Every network is assigned a unique network name • Daemons advertise (a) public accessible IP; (b) real IP; (c) network name. • Names match ? use real ip : use public IP. › Improved Robustness h Broker dies -> master finds another broker and restarts. h When master starts up, it pings a list o brokers and randomly chooses from those that respond. h Bug fixes › Improved Logging – now they are helpful and sane. 23
Process Tracking Guarantee Iron-clad tracking of process groups h. Even if running as the job submitter h. Uses supplementary group ids h. Linux only h. Also as a standalone-daemon for OSG USE_GID_PROCESS_TRACKING = True MIN_TRACKING_GID = 750 MAX_TRACKING_GID = 757 24
Better Collector Authorization › New authorization levels to allow › different rules for submission –vsexecution h. ADVERTISE_STARTD, ADVERTISE_SCHEDD New config setting COLLECTOR_REQUIREMENTS expression must evaluate to true for Collector to accept the ad. 25
# Well-known ports for the trusted daemons # Use the below ports if launching the condor_master # as root; else, pick 3 ports above 1024. MASTER_PORT = 890 SCHEDD_PORT = 891 STARTD_PORT = 892 MASTER_ARGS = -p $(MASTER_PORT) SCHEDD_ARGS = -p $(SCHEDD_PORT) STARTD_ARGS = -p $(STARTD_PORT) COLLECTOR_REQUIREMENTS = ( My. Type =? = "Machine" && regexp( "<[0 -9. ]*: $(STARTD_PORT)>" , My. Address ) ) || ( My. Type =? = "Scheduler" && regexp( "<[0 -9. ]*: $(SCHEDD_PORT)>" , My. Address ) ) || ( My. Type =? = "Daemon. Master" && regexp( "<[0 -9. ]*: $(MASTER_PORT)>" , My. Address ) ) || ( My. Type =!= "Machine" && My. Type =!= "Scheduler" && My. Type =!= "Daemon. Master" ) 26
Handy New Attributes › In your machine ad h Total. Time. Backfill. Busy, Total. Time. Backfill. Idle, Total. Time. Backfill. Killing h Total. Time. Claimed. Busy, Total. Time. Claimed. Idle h Total. Time. Claimed. Retiring, Total. Time. Claimed. Suspended h Total. Time. Matched. Idle, Total. Time. Owner. Idle h Total. Time. Preempting. Killing, Total. Time. Preempting. Vacating , Total. Time. Unclaimed. Benchmarking, Total. Time. Unclaimed. I dle › In your job ad h h Num. Job. Starts Num. Job. Reconnects Num. Shadow. Exceptions Num. Shadow. Starts 27
And last but not least… › Leases added to COD. › Simple best-fit algorithm added to dedicated › › › scheduler. Can reference resource usage and quota information in preemption policy. condor_config_val –dump [-v] Chirp improvements h Jobs can write messages into the user log h Can use proc 0 Class. Ad as a “scratch pad” › Condor shutdown via expressions h External Awareness 28
… and finally … › File Transfer I/O Throttling h MAX_CONCURRENT_DOWNLOADS and MAX_CONCURRENT_UPLOADS › More types of jobs can survive across a shutdown/crash of submit machine h Such as jobs that stream stdout/err. › User’s job log changes. › › h Can have a centralized job log file. h Get values of any job ad attribute in log. “Cron” like job scheduling (Crondor? ) Job Router shipped (Dan’s talk) License Change Source code publically released on web 29
… and finally … … and before shipping the new stable release … We squashed LOTS of bugs! 30
31
Shiny new “bug free” Condor v 7. 0. x stable series! 32
Enough already, Todd. Tell me about what is cooking with v 7. 1. x and beyond. 33
Terms of License Any and all dates in these slides are relative from a date hereby unspecified in the event of a likely situation involving a frequent condition. Viewing, use, reproduction, display, modification and redistribution of these slides, with or without modification, in source and binary forms, is permitted only after a deposit by said user into Pay. Pal accounts registered to Todd Tannenbaum …. 34
Generalizing the Startd/Starter Architecture › Making the startd more generic with the › › › underlying system. How about : running without a starter, running w/o a schedd+shadow, pulling jobs, running starter less jobs that it does not fork/exec, … Lightweight Jobs Examples • “Work Fetch” Ref to Derek’s Talk • Blue Heron Project Ref to Tom, Amanda, and Greg’s Talk 35
Some Love for Windows › Jobs can write to the registry h Condor allocates HKEY_CURRENT_USER. › Problems w/ the Batch Login approach sessions › › on Windows Server 2003 fixed (by not using them ) Interoperability with Samba (as a PDC) has been improved Arch class-ad attribute now reflects the wide range of architectures available to the Windows world; it no longer simply returns INTEL 36
Green Computing › The startd has the ability to place a machine into a low power state. (Standby, Hibernate, Soft-Off, etc. ) h. HIBERNATE, HIBERNATE_CHECK_INTERVAL h. If all slots return non-zero, then the machine is powered down; otherwise; it continues running. › Machine Class. Ad contains all information required for a client to wake it up h. Condor can wake it up, also a standalone tool. h. This was NOT as easy as it should be. › Machines in “Offline State” h. Lots of other uses › Wake-up on Matchmaking Pressure › Future Work ? 37
Plugins › Think “Firefox”… › Callouts from Condor daemons on › › appropriate events Plugin could re-implement or modify action (different than a client API) Will only build “as needed” as refactoring happens to add features h Miron : “I don’t want your plugs, I want new features!” › Examples: Collector, Accountant, File Transfers, Scheduling Algorithms, … 38
Scheduling in Condor Today CM startd startd schedd startd startd CM schedd › Distributed Ownership › Settings reflect 3 separate viewpoints: h Pool manager, Resource Owner, Job Submitter 39
But some sites want to use Condor like this: schedd startd startd › Just one submission point (schedd) › All resources owned by one entity › We can do better for these sites. h. Policy configurations are complicated. h. Some useful policies not present because they are hard to do a wide-area distributed system. h. Today the dedicated “scheduler” only supports FIFO and a naive Best Fit algorithms. 40
So what to do? schedd startd startd › Give the schedd more scheduling options. h. Examples: why can’t the schedd do priority preemption without the matchmakers help? Or move jobs from slow to fast claimed resources ? › Pluggable scheduler routines. 41
DAGMan Improvements › Automatic running of rescue DAGs (useful › › › for nested DAGs) Significantly improved speed of DAG recovery mode Assignment of “node categories” and category throttles Added generic node priorities & Depth First Traversal algorithm 42
DAGMan Depth First Example 43
Category Example Setup Run <= 2 Big job Run <= 5 Small jobjob Small jobjob Small Cleanup 44
DAGMan Future Work › DAG Splicing › Allowing custom attributes in node › › › Class. Ads Fixing condor_hold semantics Configurable job start rate Node iteration 45
DAGMan Future Work › Scalability h. Current potential about 1 million nodes h. Future up to 10 million nodes › Submit files which generate more than one cluster 46
EC 2 / VM Universe Next Steps: Impregnate Condor into the Image › When? On Demand. How? h. Job Router, Glide. In Factory, … › File Transfer To/From S 3 (Plugin!) › Options to handle Amazon’s looming threat: NAT only h. Overlay Network ? • GCB • Open. VPN h. Communicate by way of S 3 ? 47
Negotiation Performance › v 6. 8 -> automatic “significant attributes”, Match › caching v 7. 1. 0 -> “resource request” ads h Simple explanation: Resource request ad == a count plus all significant attributes. h Inserted into a schedd submitter ad. h “Give me 400 resources like this, and 200 resources like that, etc”. › Matchmaking algorithms remains the same, just › › how it “learns” about jobs changes. Disabled by default. Possibilities, possibilities… h More robust against unresponsive schedds h No startd Rank preemption? h Others? 48
49
And… › The End ™ of the NFS Locking issue › Avoid redundant copies of the same executable in the Condor spool h. Maybe more? › The “Stamping of a Passport” › End-to-End Security Ref Ian’s Talk › A web site design from this decade. 50
Thank you for being such an awesome audience and an awesome user community!!! Jason Stowe, enjoying free bacon at a local pub. Only in Wisconsin. 51
- Whats condor
- Whats a condor
- Whats always coming but never arrives
- Condor simbolo patrio de colombia
- Clasificacion taxonomica del condor
- Condor distributed computing
- Condor aero club
- Condor wings
- Bagne de poulo condor
- Apis daten condor
- Condor job flavour
- Airbus lms
- The condor cluster
- El condor pasa horse
- Condor distributed computing
- Condor de1668
- Condor scheduler
- Condor homepage
- Condor grid
- Condor atm
- Purpose of discourse analysis
- Condor software
- Condor grid
- Condor v barron knights
- Snyder introduction to the california condor download
- Coming down the pike
- Useful phrases
- Trees philip larkin
- Metaphor vs personification
- Every afternoon as they were coming
- The darkness drops again but now i know
- Andy woods the coming kingdom
- Who sings something's coming in west side story
- The maintenance stage
- The coming down of the holy spirit
- Mine eyes have seen the coming of the lord
- Example of people as media?
- Someone is coming soon
- Jesus is coming soon revelation
- It's friday sunday's coming
- I lift my hand to the coming king
- The tree by philip larkin
- Pedestrian wearing reflective clothing and red light
- Coming of age themes
- The wrath of grendel setting
- The second coming and things fall apart
- Whats a thematic statement
- Wikipedia screenshot
- The coming kingdom andy woods
- The coming age of calm technology
- Lesson note on the coming of the holy spirit
- Jesus is coming soon