Accounting Group Quotas and User Priorities Todd Tannenbaum
Accounting, Group Quotas, and User Priorities Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison tannenba@cs. wisc. edu http: //www. cs. wisc. edu/condor
Why are you here? › To learn about h. How Condor chooses the next job to run h. How you can change which job will run h. How you can prioritize jobs by project instead of by user h. How you can group users into different projects h. How you can assign usage minimums to groups of users http: //www. cs. wisc. edu/condor 2
What job runs next? › A condor “queue” is not FIFO! › Determined by a balancing the wants & needs of three entities h. The user (schedd) h. The pool administrator (negotiator) h. The machine owner (startd) › All comes together in the negotiation cycle http: //www. cs. wisc. edu/condor 3
negotiator collector startd schedd startd 1. Startds send machine ads 2. Schedds send submittor ads http: //www. cs. wisc. edu/condor 4
Machine Ads condor_status –l romano. cs. wisc. edu Machine. My. Type = "Machine" Target. Type = "Job" Name = "vm 6@romano. cs. wisc. edu" Machine = "romano. cs. wisc. edu“ Requirements = Load. Avg < 0. 5 Rank = 0. 0 Disk = 2019048 Load. Avg = 0. 000000 Keyboard. Idle = 1018497 Memory = 512 Cpus = 1 Mips = 124122 http: //www. cs. wisc. edu/condor 5
Submittor Ads condor_status -sub matthew@wisc. edu -l My. Type = "Submitter" Target. Type = "" Machine = "rosalind. cs. wisc. edu" Schedd. Ip. Addr = "<128. 105. 166. 39: 42190>" Name = “matthew@cs. wisc. edu” Running. Jobs = 1 Idle. Jobs = 1 Held. Jobs = 0 Max. Jobs. Running = 500 Start. Scheduler. Universe = TRUE Monitor. Self. Image. Size = 9164. 000000 Monitor. Self. Resident. Set. Size = 2432 http: //www. cs. wisc. edu/condor 6
negotiator collector Let’s look inside the negotiator during astartd negotiation cycle… schedd startd 1. Startds send machine ads 2. Schedds send submittor ads http: //www. cs. wisc. edu/condor 7
Inside the Negotiator… http: //www. cs. wisc. edu/condor 8
negotiator accountant http: //www. cs. wisc. edu/condor 9
Negotiation Cycle 1. Get all startd and 2. 3. 4. submittor ads Get user priorities for all submitters, or accounting principles. (via Name attribute in submitter ad) Sort submitter ads Talk to schedds in accounting principle order 5. Schedds send requests one at a time, sorted by job priority http: //www. cs. wisc. edu/condor 10
For Each Job: 6. Find all machine ads that match 7. Sort machine ads that match by: NEGOTIATOR_PRE_JOB_RANK Job Ad’s RANK NEGOTIATOR_POST_JOB_RANK 8. Is the machine ad candidate already 9. running a job? Priority preemption if PREEMPTION_REQUIREMENTS evaluates to True. Give the schedd the match, or tell it no match found. Schedd responds w/ next request (maybe skipping to the current Auto. Cluster). http: //www. cs. wisc. edu/condor 11
Some observations › Job priority (condor_prio) will not › allow one user to run ahead of another user. Job priority is specific per user per schedd. http: //www. cs. wisc. edu/condor 12
Examples Job says : Rank = Memory Config file does not define: NEGOTIATOR_PRE_JOB_RANK User will then ALWAYS get the highest memory machine, even if already being used by a lower priority user. http: //www. cs. wisc. edu/condor 13
Examples Job says : Rank = Memory Config file says: NEGOTIATOR_PRE_JOB_RANK = Remote. User =? = UNDEFINED User will then get the highest IDLE memory machine, and will only preempt a user if there are no idle machines match. http: //www. cs. wisc. edu/condor 14
Accounting Groups I don’t care about WHO submitted the job. How do I change the accounting principle? http: //www. cs. wisc. edu/condor 15
Account Groups, cont › In job submit file executable = foo universe = vanilla +Accounting. Group = “Project 44” queue http: //www. cs. wisc. edu/condor 16
A given group should have priority on 50 nodes of my 500 machine cluster. How? Answer A: Startd Rank Answer B: Group Quotas http: //www. cs. wisc. edu/condor 17
Group Quotas – Config Params GROUP_NAMES - list the recognized group names. Example: GROUP_NAMES = group-cms, group-infn GROUP_QUOTA_<groupname> - the number of machines 'owned' by this group. Example: GROUP_QUOTA_group-cms = 10 GROUP_QUOTA_group-cms = 5 GROUP_AUTOREGROUP - set this to either True or False. Defaults to false. If true, then users who submitted to a specific group will also negotiate a second time with the "none" group, allowing group jobs to be matched w/ idle machines even if the group is overquota. http: //www. cs. wisc. edu/condor 18
Negotiation w/ Group Quotas › Matchmaker first negotiates for groups, sorted by how far they are under quota. h. Negotiation within a group follows the exact same algorithm as before. › THEN, negotiate for all users that are not in a group as before. http: //www. cs. wisc. edu/condor 19
Lots o “tools” to get what you want › USER: Job Requirements, Job Rank › ADMIN: user priorities, accounting › groups, accounting group quotas, preemption_requirements, negotiator_pre|post_job_rank. OWNER: Machine Requirements, Machine Rank. http: //www. cs. wisc. edu/condor 20
Challenge Question I want LOW, MED, and HIGH strict priority job types. But at each priority level, I want fair share. HOW? ? http: //www. cs. wisc. edu/condor 21
Questions? http: //www. cs. wisc. edu/condor 22
- Slides: 22