HTCondor and Networking Greg Thain Center for High

  • Slides: 23
Download presentation
HTCondor and Networking Greg Thain Center for High Throughput Computing

HTCondor and Networking Greg Thain Center for High Throughput Computing

Introduction › HTCondor built in a simpler time: h. Every machine can connect to

Introduction › HTCondor built in a simpler time: h. Every machine can connect to every other h. More TCP ports available than can be used h. Every machine has 1 network interface h. IPv 4 “enough addresses for everyone” h. DNS exists everywhere, correctly and reliably h. All connections symmetric 2

Design Problem: Listeners everywhere › › Multihoming? Firewalls? NAT? Asymmetry? Each daemon has ONE

Design Problem: Listeners everywhere › › Multihoming? Firewalls? NAT? Asymmetry? Each daemon has ONE address in collector! Central Manager Submit Machine 3 Execute Machine

What is “the name? ” The “sinful” string: examples <192. 168. 1. 15: 9618>

What is “the name? ” The “sinful” string: examples <192. 168. 1. 15: 9618> <192. 168. 1. 15: 9618? key=value> In My. Addr attribute And condor_tool –addr ‘<sinful>’ 4

Which Address will a machine advertise? If… BIND_ALL_INTERFACES = true (default) NETWORK_INTERFACE = unset

Which Address will a machine advertise? If… BIND_ALL_INTERFACES = true (default) NETWORK_INTERFACE = unset (default) ENABLE_ADDRESS_REWRITING = true Then… Machine listens on all interfaces, Collector rewrites to “collector” interface 5 (default)

Network rewrite Central Manager 10. 0. 5. 15 eth 1 192. 168. 5. 15

Network rewrite Central Manager 10. 0. 5. 15 eth 1 192. 168. 5. 15 eth 0 Execute Machine 6

Which Address will a machine advertise? If… BIND_ALL_INTERFACES NETWORK_INTERFACE = = false (undefault) 10.

Which Address will a machine advertise? If… BIND_ALL_INTERFACES NETWORK_INTERFACE = = false (undefault) 10. * (or) eth 0 (or) 10. 5. 3. 4 Then… Machine listens on specified interface (only), and advertises that! 7

Which Address will a machine advertise? If… BIND_ALL_INTERFACES = false(default) NETWORK_INTERFACE = unset (default)

Which Address will a machine advertise? If… BIND_ALL_INTERFACES = false(default) NETWORK_INTERFACE = unset (default) Then… Machine listens on one interface, heuristcally chosen by condor, and advertises that. 8

Completely Punting to proxy › TCP_FORWARDING_HOST = foo. com › Says “you can connect

Completely Punting to proxy › TCP_FORWARDING_HOST = foo. com › Says “you can connect to me at foo. com” › How? h. Up to you: • Ssh forwarding • iptables? • Magic 9

Solutions for firewalls › Easiest: HIGHPORT/LOWPORT › LOWPORT = 9000 › HIGHPORT = 10000

Solutions for firewalls › Easiest: HIGHPORT/LOWPORT › LOWPORT = 9000 › HIGHPORT = 10000 › Assuming holes punched in firewall › If only need inbound (common case): › IN_LOWPORT = 9000 › IN_HIGHPORT = 10000 10

How Many ports? › Schedd: h 5 + 5 * MAX_JOBS_RUNNING › Startd h

How Many ports? › Schedd: h 5 + 5 * MAX_JOBS_RUNNING › Startd h 5 + 5 * max slots › (Assuming no shared_port or CCB) 11

What happens on port exhaustion? › Badness. › Jobs won’t start for no apparent

What happens on port exhaustion? › Badness. › Jobs won’t start for no apparent reason › Keep an eye on ports in this case. 12

Split Network Central Manager 10. 0. 5. 15 eth 1 192. 168. 5. 15

Split Network Central Manager 10. 0. 5. 15 eth 1 192. 168. 5. 15 eth 0 Execute Machine 13

Split Network schedd Central Manager 10. 0. 5. 15 eth 1 192. 168. 5.

Split Network schedd Central Manager 10. 0. 5. 15 eth 1 192. 168. 5. 15 eth 0 Execute Machine 14

Private network support PRIVATE_NETWORK_INTERFACE = 1. 2. 3. 4 PRIVATE_NETWORK_INTERFACE=eth 1 PRIVATE_NETWORK_NAME=My. Priv. Net

Private network support PRIVATE_NETWORK_INTERFACE = 1. 2. 3. 4 PRIVATE_NETWORK_INTERFACE=eth 1 PRIVATE_NETWORK_NAME=My. Priv. Net Any time two condor machine connect, condor will use this network and advertise it. Need not actually be the private network 15

Shared Port › Problem: only ~ 60, 000 TCP ports › Need one per

Shared Port › Problem: only ~ 60, 000 TCP ports › Need one per shadow › Shared port Service h*Doesn’t work with standard universe* › USE_SHARED_PORT = true › DAEMON_LIST = … SHARED_PORT › Changes sinful string to <192. 168. 1. 100: 9618? sock=xxx_yyy> 16

schedd Internet 17 Fire wall condor_shared_port startd shared_port starter

schedd Internet 17 Fire wall condor_shared_port startd shared_port starter

CCB: Condor Connection Broker › Bypasses firewalls by reversing connection › Requires one machine

CCB: Condor Connection Broker › Bypasses firewalls by reversing connection › Requires one machine with no firewall h. Usually the collector › Doesn’t work with standard universe › Only bypasses one firewall h. Usually in front of the startds h. Schedds / Central managers w/o firewalls 18

CCB: Condor Connection Broker schedd Internet 19 Outbound firewall CCB startd

CCB: Condor Connection Broker schedd Internet 19 Outbound firewall CCB startd

CCB Configuration › CCB built into condor_collector CCB_ADDRESS = $(COLLECTOR_HOST) PRIVATE_NETWORK_NAME = domain 20

CCB Configuration › CCB built into condor_collector CCB_ADDRESS = $(COLLECTOR_HOST) PRIVATE_NETWORK_NAME = domain 20

IPv 6 › Still an active area of work ENABLE_IPV 6 = true ENABLE_IPV

IPv 6 › Still an active area of work ENABLE_IPV 6 = true ENABLE_IPV 4 = false NETWORK_INTERFACE = 2607: f 388: 1086: 0: 21 b: 24 ff: fedf: b 520 21

Putting it all together › CCB works with shared port h. Common Combination ›

Putting it all together › CCB works with shared port h. Common Combination › If you have CCB, probably don’t need › highport/lowport CCB works together with private networks h. Can be big performance win 22

Thank you! 23

Thank you! 23