Whats new in HTCondor Whats coming HTCondor Week

  • Slides: 47
Download presentation
What’s new in HTCondor? What’s coming? HTCondor Week 2016 Madison, WI -- May 18,

What’s new in HTCondor? What’s coming? HTCondor Week 2016 Madison, WI -- May 18, 2016 Todd Tannenbaum Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison

Release Timeline › Stable Series HTCondor v 8. 4. x - introduced Aug 2015

Release Timeline › Stable Series HTCondor v 8. 4. x - introduced Aug 2015 › › (Currently at v 8. 4. 6) Development Series HTCondor v 8. 5. 5 frozen, in beta test, release to web later this month. HTCondor v 8. 6. 0 expected summer 2016. Source: https: //www. openhub. net/p/condorproject 3

Some enhancements in HTCondor v 8. 4 › Scalability and stability Goal: 200 k

Some enhancements in HTCondor v 8. 4 › Scalability and stability Goal: 200 k slots in one pool, 10 schedds managing 400 k jobs Resolved developer tickets: 240 bug fix issues (v 8. 2. x tickets), 234 enhancement issues (v 8. 3 tickets) › › › › Docker Job Universe Tool improvements, esp condor_submit IPv 6 mixed mode Encrypted Job Execute Directory Periodic application-layer checkpoint support in Vanilla Universe Submit requirements New packaging 4

Scalability Enhancement Examples 5

Scalability Enhancement Examples 5

Condor_shadow resources Reduce memory footprint of Shadow Eliminate need for authentication step to schedd,

Condor_shadow resources Reduce memory footprint of Shadow Eliminate need for authentication step to schedd, startd (on execute host) v 7. 8. 7: 860 KB/ 1860 KB v 8. 4. 0 386 KB 6

Authentication Speedups › FS (file system) and GSI authentication are now performed asynchronously So

Authentication Speedups › FS (file system) and GSI authentication are now performed asynchronously So now a Condor daemon can perform many authentications in parallel CMS pool went from 200 execute nodes (glideins) per collector to 2000 › Can cache mapping of GSI certificate name to user name Mapping can be heavyweight, esp if HTCondor has to contact an external service (LCMAPS…) Knob name is GSS_ASSIST_GRIDMAP_CACHE_EXPIRATION 7

Faster assignment of resources from central manager to schedd › Negotiator can ask the

Faster assignment of resources from central manager to schedd › Negotiator can ask the schedd for more than one resource request per network round trip. NEGOTIATOR_RESOURCE_REQUEST_LIST_SIZE = 20 8

Impact of multiple resource requests Negotiation times for 1000 slot pool 1400 1153 1200

Impact of multiple resource requests Negotiation times for 1000 slot pool 1400 1153 1200 1000 8. 2. 8 LAN 8. 3. 5 LAN 20 reqs 800 8. 3. 5 LAN 100 reqs 600 407370 400 8. 3. 5 WAN 20 reqs 8. 3. 5 WAN 100 reqs 200 0 8. 2. 8 WAN 113 9 4 4 40 36 32 19 17 1000 # of job autoclusters 9

Eliminate CCB service pauses 10

Eliminate CCB service pauses 10

Query Responsiveness › Improvement: Collector will not fork for queries to small tables Load

Query Responsiveness › Improvement: Collector will not fork for queries to small tables Load Collector with 100 k machine ads Before change: ~4. 5 queries/second After change: ~24. 4 queries/second › Improvement: Schedd condor_q quantum adjusted (to 100 ms) Load schedd with 100 k jobs ads, 40 Hz job throughput Before change: ~135 seconds per condor_q After change: ~22 seconds per condor_q 11

12

12

Container Support (Black Box Applications) › HTCondor cgroup support now manages › swap space

Container Support (Black Box Applications) › HTCondor cgroup support now manages › swap space in addition to CPU, Memory New job universe to support Docker Containers Please talk to us if you have interest in using Docker with HTCondor! 13

Docker Universe Job Is still a job › Docker containers have the job-nature condor_submit

Docker Universe Job Is still a job › Docker containers have the job-nature condor_submit condor_rm condor_hold Write entries to the job event log(s) condor_dagman works with them Policy expressions work. Matchmaking works User prio / job prio / group quotas all work Stdin, stdout, stderr work Etc. etc. * 14

Many condor_submit improvements You submit your jobs with that script? ? !? You’re braver

Many condor_submit improvements You submit your jobs with that script? ? !? You’re braver than I thought! 15

More ways to Queue 'foreach' Queue <N> <N> <var> in (<item-list>) <var> matching (<glob-list>)

More ways to Queue 'foreach' Queue <N> <N> <var> in (<item-list>) <var> matching (<glob-list>) <vars> from <filename> <vars> from <script> | › Iterate <items>, creating <N> jobs for each item › In/from/matching keywords control how we get <items> › There's more. See the manual for details. 16

Example: Queue matching files Executable = foo. exe Arguments = -inputdata $(Item) Queue 1

Example: Queue matching files Executable = foo. exe Arguments = -inputdata $(Item) Queue 1 Item matching (*. dat, m*) › Produces a job for each file that matches › *. dat or m* (or both) $(Item) holds each filename in turn 17

Condor_q new arguments › -dag <dagman-job-id> Show all jobs in the dag › -limit

Condor_q new arguments › -dag <dagman-job-id> Show all jobs in the dag › -limit <num> Show at most <num> records › -totals Show only totals › -autocluster -long Group and count jobs that have same requirements …perfect for provisioning systems 19

IPv 6 Support › New in 8. 4 is support for “mixed mode, ”

IPv 6 Support › New in 8. 4 is support for “mixed mode, ” › › › using IPv 4 and IPv 6 simultaneously. A mixed-mode pool’s central manager and submit nodes must each be reachable on both IPv 4 and IPv 6. Execute nodes and (other) tool-hosting machines may be IPv 4, IPv 6, or both. ENABLE_IPV 4 = TRUE ENABLE_IPV 6 = TRUE 20

Encrypted Execute Directory › Jobs can request (or admins can require) that their scratch

Encrypted Execute Directory › Jobs can request (or admins can require) that their scratch directory be encrypted in realtime /tmp and /var/tmp output also encrypted Put encrypt_execute_directory=True in job submit file (or condor_config) › Only the condor_starter and job processes can see the cleartext Even a root ssh login / cron job will not see the cleartext Batch, interactive, and condor_ssh_to_job works 21

Periodic Application-Level Checkpointing in the Vanilla Universe › Experimental feature! › If requested, HTCondor

Periodic Application-Level Checkpointing in the Vanilla Universe › Experimental feature! › If requested, HTCondor periodically sends › › the job its checkpoint signal and waits for the application to exit. If it exits with code 0, HTCondor considers the checkpoint successful and does file transfer, and re-executes the application. Otherwise, the job is requeued. 22

Submit Requirements › Allow administrator to decide which jobs enter the queue via a

Submit Requirements › Allow administrator to decide which jobs enter the queue via a SUBMIT_REQUIREMENTS constraint › Rejection (error) message may be customized 23

HTCondor RPM Packaging ›More Standard Packaging Matches OSG and Fedora package layout Built with

HTCondor RPM Packaging ›More Standard Packaging Matches OSG and Fedora package layout Built with rpmbuild Source RPM is released • Can rebuild directly from the source RPM • Build requirements are enforced by rpmbuild Partitioned into several binary RPMs • Pick and choose what you need 24

HTCondor Binary RPM Packages RPM Description condor Base package condor-all Includes all the packages

HTCondor Binary RPM Packages RPM Description condor Base package condor-all Includes all the packages in a typical installation condor-bosco BOSCO – Manage jobs on remote clusters via ssh condor-classads HTCondor classified advertisement library condor-classads-devel Development support for classads condor-debuginfo Symbols for libraries and binaries condor-externals External programs and scripts condor-externals-libs External libraries condor-kbdd HTCondor Keyboard Daemon condor-procd HTCondor Process Tracking Daemon condor-python Python Bindings for HTCondor condor-static-shadow Static Shadow (Use 32 -bit shadow on 64 -bit system) condor-std-universe Standard Universe Support condor-vm-gahp VM Universe Support 25

HTCondor Debian Packaging ›More Standard Packaging Matches debian package layout Built with pbuilder Source

HTCondor Debian Packaging ›More Standard Packaging Matches debian package layout Built with pbuilder Source package is released deb Description condor Base Package condor-dbg Symbols for libraries and programs condor-dev Development files for HTCondor condor-doc HTCondor documentation libclassad-dev Development files for Classads libclassad 7 Classad runtime libaries 26

28

28

What to do with all these statistics? › Aggregate and send them to Ganglia!

What to do with all these statistics? › Aggregate and send them to Ganglia! condor_gangliad introduced in v 8. 2 See manual or my talk at http: //bit. ly/1 YBBO 3 P › In addition to (or instead of) sending to Ganglia, aggregate and make available in JSON format over HTTP condor_gangliad rename to condor_metricd › View some basic historical usage out-of-the-box › by pointing web browser at central manager (modern Condor. View)… Or upload to influxdb, graphite for Grafana 29

30

30

Page 790 31

Page 790 31

Enabled by default and/or easier to configure › Enabled by default: shared port, cgroups,

Enabled by default and/or easier to configure › Enabled by default: shared port, cgroups, IPv 6 Have both IPv 4 and v 6? Prefer IPv 4 for now › Configured by default: Kernel tuning › Easier to configure: Enforce slot sizes use policy: preempt_if_cpus_exceeded use policy: hold_if_cpus_exceeded use policy: preempt_if_memory_exceeded use policy: hold_if_memory_exceeded 32

New condor_q default output › Only show jobs owned by the user › Batched

New condor_q default output › Only show jobs owned by the user › Batched output (-batch, -nobatch) › Proposed new default output of condor_q will show summary of current users jobs. -- Submitter: adam Schedd: submit-3. chtc. wisc. edu OWNER IDLE RUNNING HELD SUBMITTED DESCRIPTION adam 1 - 3/22 07: 20 DAG: 221546 1 3/23 08: 57 Atlas. Anlysis 1 - 3/27 09: 37 matlab. exe 133 21 - 3/27 11: 46 DAG: 311986 JOBIDs 230864. 0 263203. 0 307333. 0 312342. 0. . . 313304. 0 In the last 20 minutes: 0 Job(s) were Completed 5 Job(s) were Started 312690. 0. . . 312695. 0 1 Job(s) were Held 263203. 0 5/11 07: 22 Error from slot 1@eee. chtc. wisc. edu: out of disk 33

New condor_q default output › Only show jobs owned by the user disable with

New condor_q default output › Only show jobs owned by the user disable with -allusers › Batched output (-batch, -nobatch) › Proposed new default output of condor_q will show summary of current user's jobs. -- Submitter: adam OWNER adam IDLE RUNNING 1 1 133 21 Schedd: HELD 1 - submit-3. chtc. wisc. edu SUBMITTED DESCRIPTION 3/22 07: 20 DAG: 221546 3/23 08: 57 Atlas. Anlysis 3/27 09: 37 matlab. exe 3/27 11: 46 DAG: 311986 JOBIDs 230864. 0 263203. 0 307333. 0 312342. 0. . . 313304. 0 In the last 20 minutes: 0 Job(s) were Completed 5 Job(s) were Started 312690. 0. . . 312695. 0 1 Job(s) were Held 263203. 0 5/11 07: 22 Error from slot 1@eee. chtc. wisc. edu: out of disk 34

New condor_status default output › Only show one line of output per machine ›

New condor_status default output › Only show one line of output per machine › Can try now in v 8. 5. 4+ with "-compact" › option The "-compact" option will become the new default once we are happy with it Machine Platform gpu-1 gpu-2 gpu-3 matlab-build mem 1 x 64/SL 6 x 64/SL 6 Slots Cpus Gpus 8 8 8 1 32 8 8 8 12 80 2 2 4 35 Total. Gb Fre. Cpu 15. 57 47. 13 23. 45 1009. 67 0 0 0 11 0 Free. Gb 0. 44 0. 57 16. 13 23. 33 160. 17 Cpu. Load ST 1. 90 1. 87 0. 85 0. 00 1. 00 Cb Cb Cb ** Cb

HTCondor and Kerberos › HTCondor currently allows you to › authenticate users and daemons

HTCondor and Kerberos › HTCondor currently allows you to › authenticate users and daemons using Kerberos However, it does NOT currently provide any mechanism to provide a Kerberos credential for the actual job to use on the execute slot 36

HTCondor and Kerberos/AFS › So we are adding support to launch jobs › with

HTCondor and Kerberos/AFS › So we are adding support to launch jobs › with Kerberos tickets / AFS tokens Details HTCondor 8. 5. X to allows an opaque security credential to be obtained by condor_submit and stored securely alongside the queued job ( in the condor_credd daemon ) This credential is then moved with the job to the execute machine Before the job begins executing, the condor_starter invokes a call-out to do optional transformations on the credential 37

Grid Universe › Reliable, durable submission of a job to a remote scheduler ›

Grid Universe › Reliable, durable submission of a job to a remote scheduler › Popular way to send pilot jobs › Supports many “back end” types: HTCondor PBS LSF Grid Engine Google Compute Engine Amazon EC 2 Open. Stack Deltacloud Cream Nordu. Grid ARC BOINC Globus: GT 2, GT 5 UNICORE 38

120000 0 13: 00: 51 13: 06: 11 13: 11: 53 13: 16: 39

120000 0 13: 00: 51 13: 06: 11 13: 11: 53 13: 16: 39 13: 22: 27 13: 28: 29 13: 33: 56 13: 39 13: 45: 36 13: 50: 46 13: 56: 11 14: 01: 52 14: 07: 26 14: 12: 57 14: 18: 54 14: 24: 00 14: 29: 56 14: 35: 55 14: 41: 01 14: 46: 35 14: 52: 37 14: 58: 20 15: 03: 58 15: 09: 55 15: 16: 13 15: 22: 08 15: 28: 12 15: 34: 26 15: 41: 14 15: 48: 23 15: 54: 25 16: 00: 16 16: 06: 14 16: 12: 11 16: 18: 15 16: 24: 17 16: 30: 15 16: 36: 16 16: 42: 39 16: 48: 46 16: 54: 40 17: 00: 31 17: 06: 18 17: 12: 02 17: 30 17: 23: 10 17: 28: 47 17: 34: 32 17: 40: 05 17: 45: 36 17: 51: 26 17: 56: 51 18: 02: 25 18: 07: 26 18: 13: 26 18: 45 18: 24: 26 18: 29: 47 Improved Scalability of Amazon EC 2 grid jobs Number of jobs running on Spot instances in Amazon AWS 100000 80000 60000 40000 20000 39

Elastically grow your pool into the Cloud: condor_annex › Leverage efficient AWS APIs such

Elastically grow your pool into the Cloud: condor_annex › Leverage efficient AWS APIs such as Auto Scaling Groups and Spot Fleets Implement a “lease” so charges cease if lease expires › Secure mechanism for cloud instances to join the HTCondor pool at home institution condor_annex --set-size 2000 --lease 24 --project “ 144 PRJ 22” 40

Grid Universe support for SLURM, Open. Stack, Cobalt › Speak native SLURM protocol No

Grid Universe support for SLURM, Open. Stack, Cobalt › Speak native SLURM protocol No need to install PBS compatibility package › Speak Open. Stack’s NOVA › protocol Speak to Cobalt Scheduler Argonne Leadership Computing Facilities 41 Jaime: Grid Jedi

Transformation of job ad upon submit › Allow admin to have the schedd add/edit

Transformation of job ad upon submit › Allow admin to have the schedd add/edit job attributes upon submission ( use case: insert trusted group attributes based upon owner ) › In v 8. 5. 1+ can also set attributes as immutable by the user › Prevent user from editing protected attributes with condor_qedit or chirp 42

Docker Universe Enhancements › Docker jobs get usage updates (i. e. › network usage)

Docker Universe Enhancements › Docker jobs get usage updates (i. e. › network usage) reported in job classad Admin can additional volumes That all docker universe jobs get Why? • CVMFS • Large shared data Details https: //htcondorwiki. cs. wisc. edu/index. cgi/tktview? tn=5308 43

Potential Future Docker Universe Features? › Advertise images already cached on › › ›

Potential Future Docker Universe Features? › Advertise images already cached on › › › machine ? Support for condor_ssh_to_job ? Package and release HTCondor into Docker Hub ? Network support beyond NAT? Run containers as root? ? !? !? Automatic checkpoint and restart of containers! (via CRIU)

SELinux and systemd › SELinux (On by default in RHEL 7) › Systemd Integration

SELinux and systemd › SELinux (On by default in RHEL 7) › Systemd Integration Port Reservation - Systemd will reserve 9618 for HTCondor Watchdog - If masters stops responding, systemd will restart it Status messages - display via systemctl status Logging - Daemon log messages can go to systemd-journald 45

Draining jobs from execute nodes › Add ability to backfill with pre-emptable jobs while

Draining jobs from execute nodes › Add ability to backfill with pre-emptable jobs while draining Specifically, ability to specify a new startd START expression when entering drain state › Add ability to shutdown when fully drained Alternative to condor_off -peaceful › Investigating ability to upgrade HTCondor on execute nodes without restarting jobs 46

DAGMan Improvements Splice Pin connections Allows more flexible parent/child relationships between nodes within splices

DAGMan Improvements Splice Pin connections Allows more flexible parent/child relationships between nodes within splices Parsed when DAGMan starts up INCLUDE directive Set Class. Ad attributes in DAG Set Batch Name

Seeking ideas to help users and admins learn › Move HOWTO › › recipes

Seeking ideas to help users and admins learn › Move HOWTO › › recipes on wiki to stackoverflow? Sub-reddit instead of email list? You. Tube videos? 48

Smarter and Faster Schedd › User accounting information moved into ads in the Collector

Smarter and Faster Schedd › User accounting information moved into ads in the Collector Enable schedd to move claims across users › Non-blocking authentication, smarter › updates to the collector, faster Class. Ad processing Late materialization of jobs in the schedd to enable submission of very large sets of jobs More jobs materialized once number of idle jobs drops below a threshold (like DAGMan throttling) 49

Thank You! P. S. Interested in working on HTCondor full time? Talk to me!

Thank You! P. S. Interested in working on HTCondor full time? Talk to me! We are hiring! htcondor-jobs@cs. wisc. edu 50