Monitoring and Controlling HTCondor with Python John TJ

  • Slides: 27
Download presentation
Monitoring and Controlling HTCondor with Python John (TJ) Knoeller HTCondor Week UK 2018

Monitoring and Controlling HTCondor with Python John (TJ) Knoeller HTCondor Week UK 2018

Overview › › Where we are Class. Ads and Expr. Trees htcondor bindings Examples

Overview › › Where we are Class. Ads and Expr. Trees htcondor bindings Examples 2

Design Philosophy › Class. Ads heverything is based on Class. Ads › Pythonic h.

Design Philosophy › Class. Ads heverything is based on Class. Ads › Pythonic h. Use iterators, exceptions, guards h. Class. Ads behave as much like a dict as reasonable › Native Code h. Uses the same library code as the HTCondor tools 3

Our Goal › Complete h. If you can do it with the command line

Our Goal › Complete h. If you can do it with the command line tools, you should be able to do it with python › Backward Compatible h. APIs will stay stable as long as possible h. Bad or broken APIs will be superseded, not removed • (queue_with_itemdata supersedes submit. Many) h. May use python Deprecation. Warning 4

Where we are › In 8. 6 h. Works with system python on Linux

Where we are › In 8. 6 h. Works with system python on Linux h. Python 2 or Python 3 but not both h. Windows is Python 2. 7 only › In 8. 8 h. Plan to ship both Python 2 and Python 3 bindings h. Windows x 64 builds are Python 2. 7 and Python 3. 6 5

About Python 3 › The way file objects are passed from python to c++

About Python 3 › The way file objects are passed from python to c++ code is completely different in python 3 h. Doesn't work at all on Windows › A few APIs need to change as a result h. Event. Iterator, Log. Reader 6

Read the docs › https: //htcondor-python. readthedocs. io 7

Read the docs › https: //htcondor-python. readthedocs. io 7

Jupyter. Hub Tutorials › https: //hcc-anvil-175. 29. unl. edu h. Login with university credential

Jupyter. Hub Tutorials › https: //hcc-anvil-175. 29. unl. edu h. Login with university credential h. Spawns a docker instance with a private HTCondor › Much of this talk is taken from there 8

Class. Ads › import classad hprovides the Class. Ad and Expr. Tree classes ›

Class. Ads › import classad hprovides the Class. Ad and Expr. Tree classes › defines class Class. Ad h. Behaves like a dict {key : value} › values are of type Expr. Tree which can be ha simple value (int, real, bool, string) ha special value (Undefined, Error) han expression which evaluates to one of the above 9

Expr. Tree › Expr. Tree is a Class. Ad expression h. Literals (int, real,

Expr. Tree › Expr. Tree is a Class. Ad expression h. Literals (int, real, bool, string) are automatically converted to python types h. Undefined and Error have no python equivalent › Evaluated lazily, only when explicitly asked › When eval() method of Expr. Tree is called hreturns a Literal (or Undefined/Error) 10

Expr. Tree examples e = classad. Expr. Tree("1 + 4") print "Expr %s is

Expr. Tree examples e = classad. Expr. Tree("1 + 4") print "Expr %s is of type %s" % (e, type(e)) Expr 1 + 4 is of type <class 'classad. Expr. Tree'> v = e. eval() print "It evaluates to %s of type %s" % (v, type(v)) It evaluates to 5 of type <type 'long'> 11

Evaluate in Class. Ad context ad = classad. Class. Ad('[a=1; b=4; tot=a+b]') print ad['a']

Evaluate in Class. Ad context ad = classad. Class. Ad('[a=1; b=4; tot=a+b]') print ad['a'] print ad['tot']. eval() 1 a + b 5 print ad. eval('tot') 5 12

Expressions vs Literals › Many classads in HTCondor have values that can › ›

Expressions vs Literals › Many classads in HTCondor have values that can › › be expressions but are usually literals HTCondor daemons almost always evaluate rather than look-up attributes in Class. Ads You should do the same h. Use ad['name'] if you are sure it's a literal h. Use ad. eval('name') if you don't know 13

› And now, we get to the HTCondor part. . . 14

› And now, we get to the HTCondor part. . . 14

bindings for Major Daemons › import htcondor hhtcondor. Collector() • query, direct. Query, locate,

bindings for Major Daemons › import htcondor hhtcondor. Collector() • query, direct. Query, locate, advertise : condor_status, advertise hhtcondor. Schedd() • • • query, xquery, history act, edit transaction, spool, retrieve submit, submit. Many negotiate, reschedule 15 : look at jobs : change jobs : submit jobs : obsolete submit interface : specialized users only

bindings for Minor daemons hhtcondor. Startd() • drain. Jobs, cancel. Drain. Jobs hhtcondor. Negotiator()

bindings for Minor daemons hhtcondor. Startd() • drain. Jobs, cancel. Drain. Jobs hhtcondor. Negotiator() • set. Priority, set. Factor, reset. Usage, . . . • get. Priorities, get. Resource. Usage – (In 8. 7 you can query Accounting ads from the Collector) 16

Submit object › htcondor. Submit() hqueue, queue_with_itemdata › Wraps up a HTCondor submit file

Submit object › htcondor. Submit() hqueue, queue_with_itemdata › Wraps up a HTCondor submit file › New capabilities in 8. 7 › A whole talk on this. . . 17

HTCondor config › htcondor. version() hget the HTCondor version › htcondor. param['knob'] hget the

HTCondor config › htcondor. version() hget the HTCondor version › htcondor. param['knob'] hget the expanded value of the config knob › htcondor. reload_config() hreread the HTCondor config files › htcondor. Remote. Param(daemon. Ad) hquery the configuration of a daemon 18

Log Readers hhtcondor. Job. Event. Log (new in 8. 7. 10!) • Iterate a

Log Readers hhtcondor. Job. Event. Log (new in 8. 7. 10!) • Iterate a job's log as a stream of Job. Event(s) • Supersedes htcondor. Event. Iterator hhtcondor. Event. Iterator • For completed jobs, iterate the job's log as a stream of events • Let us know if you are using Event. Iterator, – (We think no-one is using this and would like to kill it) hhtcondor. Log. Reader • Reads the schedd's job queue log file 19

Find a Startd coll = htcondor. Collector() startd = coll. locate(htcondor. Daemon. Types. Startd,

Find a Startd coll = htcondor. Collector() startd = coll. locate(htcondor. Daemon. Types. Startd, "host") print startd['My. Address'] print "type is %s, size is %d" % (type(startd), len(startd)) "<127. 0. 0. 1: 64900? addrs=127. 0. 0. 1 -64900>" type is <class 'classad. Class. Ad'>, size is 6 ads = coll. query(htcondor. Ad. Types. Startd, 'Machine=="host"') print "type is %s, size is %d" % (type(ads), len(ads)) print "type is %s, size is %d" % (type(ads[0]), len(ads[0])) type is <type 'list'>, size is 4 type is <class 'classad. Class. Ad'>, size is 144 20

locate vs query › collector. locate() h. Returns an ad for a single daemon

locate vs query › collector. locate() h. Returns an ad for a single daemon h. Just enough attributes to open a socket to that daemon › collector. query() h. Returns a list of ads from one of the collections h. A LOT of attributes unless you use a projection • always use a projection! 21

direct. Query › direct. Query is a locate followed by a query h. Use

direct. Query › direct. Query is a locate followed by a query h. Use it to query the startd or schedd directly • Get more verbose statistics › Equivalent to this coll = htcondor. collector() schedd_ad = coll. locate(htcondor. Daemon. Types. Schedd) daemon = htcondor. collector(schedd_ad['My. Address']) daemon. query(htcondor. Ad. Types. Schedd, statistics='ALL: 2') 22

Query jobs from a Schedd schedd = htcondor. Schedd() # get the local schedd

Query jobs from a Schedd schedd = htcondor. Schedd() # get the local schedd jobs = schedd. query('Cluster. Id=23', ['Job. Status'], limit=2) for job in jobs : print job. __repr__() [My. Type="Job", Job. Status=5, Cluster. Id=23, Proc. Id=0] [My. Type="Job", Job. Status=5, Cluster. Id=23, Proc. Id=1] ads = schedd. query(opts=htcondor. Query. Opts. Summary. Only) print(ads[0]) [ All. Usesr. Idle = 0; All. Users. Held = 2; . . . 23

XQuery jobs from a Schedd schedd = htcondor. Schedd() # get the local schedd

XQuery jobs from a Schedd schedd = htcondor. Schedd() # get the local schedd attrs = ['Cluster. Id', 'Proc. Id', 'Job. Status'] for job in schedd. xquery(projection=attrs): print job. __repr__() [My. Type="Job", Job. Status=5, Cluster. Id=23, Proc. Id=0] [My. Type="Job", Job. Status=5, Cluster. Id=23, Proc. Id=1] › xquery returns an iterator h. This has less overhead but. . . h. It's bad to walk this iterator slowly 24

query vs xquery › You can send 'collector' queries to a schedd, but ›

query vs xquery › You can send 'collector' queries to a schedd, but › › not visa versa. (x)query methods on class Schedd returns jobs There is no xquery for the Collector. (it cannot do async replies) 25

So - What's Missing? › What do we need to add? h. Credd h?

So - What's Missing? › What do we need to add? h. Credd h? 26

Any Questions? 27

Any Questions? 27