HTCondor and Networking Jaime Frey Center for High

  • Slides: 23
Download presentation
HTCondor and Networking Jaime Frey Center for High Throughput Computing

HTCondor and Networking Jaime Frey Center for High Throughput Computing

Introduction › HTCondor built in a simpler time: h. Every machine can connect to

Introduction › HTCondor built in a simpler time: h. Every machine can connect to every other h. More TCP ports available than can be used h. Every machine has 1 network interface h. IPv 4 “enough addresses for everyone” h. DNS exists everywhere, correctly and reliably h. All connections symmetric 2

Design Problem: Listeners everywhere › › Multihoming? Firewalls? NAT? Asymmetry? Each daemon has ONE

Design Problem: Listeners everywhere › › Multihoming? Firewalls? NAT? Asymmetry? Each daemon has ONE address in collector! (mostly) Central Manager Submit Machine 3 Execute Machine

What is “the name”? The “sinful” string: examples <192. 168. 1. 15: 9618> <192.

What is “the name”? The “sinful” string: examples <192. 168. 1. 15: 9618> <192. 168. 1. 15: 9618? key=value> In My. Address attribute And condor_tool –addr ‘<sinful>’ 4

Which Address will a machine advertise? If… BIND_ALL_INTERFACES = true (default) NETWORK_INTERFACE = unset

Which Address will a machine advertise? If… BIND_ALL_INTERFACES = true (default) NETWORK_INTERFACE = unset (default) ENABLE_ADDRESS_REWRITING = true Then… Machine listens on all interfaces, Prefers most “public” interface locally, Uses “collector” interface when advertising 5 (default)

Network rewrite Central Manager 10. 0. 5. 3 10. 0. 5. 15 eth 1

Network rewrite Central Manager 10. 0. 5. 3 10. 0. 5. 15 eth 1 128. 104. 100. 22 eth 0 Submit Machine 6

Which Address will a machine advertise? If… BIND_ALL_INTERFACES NETWORK_INTERFACE = = false (undefault) 10.

Which Address will a machine advertise? If… BIND_ALL_INTERFACES NETWORK_INTERFACE = = false (undefault) 10. * (or) eth 0 (or) 10. 5. 3. 4 Then… Machine listens on specified interface (only), and advertises that! 7

Which Address will a machine advertise? If… BIND_ALL_INTERFACES = false (undefault) NETWORK_INTERFACE = <unset>

Which Address will a machine advertise? If… BIND_ALL_INTERFACES = false (undefault) NETWORK_INTERFACE = <unset> (default) Then… Machine listens on one interface (the most “public” one) and advertises that. 8

Completely Punting to proxy › TCP_FORWARDING_HOST = foo. com › Says “you can connect

Completely Punting to proxy › TCP_FORWARDING_HOST = foo. com › Says “you can connect to me at foo. com” h. IP address of foo. com is advertised › How? h. Up to you: • Ssh forwarding • iptables? • EC 2 public address 9

Solutions for firewalls › Easiest: HIGHPORT/LOWPORT › LOWPORT = 9000 › HIGHPORT = 10000

Solutions for firewalls › Easiest: HIGHPORT/LOWPORT › LOWPORT = 9000 › HIGHPORT = 10000 › Assuming holes punched in firewall › If only need inbound (common case): › IN_LOWPORT = 9000 › IN_HIGHPORT = 10000 10

How Many ports? › Schedd: h 5 + 2 * MAX_JOBS_RUNNING › Startd h

How Many ports? › Schedd: h 5 + 2 * MAX_JOBS_RUNNING › Startd h 5 + 2 * max slots › (Assuming no shared_port or CCB) 11

What happens on port exhaustion? › Badness. › Jobs will fail to start for

What happens on port exhaustion? › Badness. › Jobs will fail to start for no apparent reason › Keep an eye on ports in this case. 12

Private network support PRIVATE_NETWORK_INTERFACE = 1. 2. 3. 4 PRIVATE_NETWORK_INTERFACE = eth 1 PRIVATE_NETWORK_NAME

Private network support PRIVATE_NETWORK_INTERFACE = 1. 2. 3. 4 PRIVATE_NETWORK_INTERFACE = eth 1 PRIVATE_NETWORK_NAME = My. Priv. Net If two machines have the same private network name, they will use the private address to communicate. Need not actually be a private network 13

Shared Port › Problem: only ~ 60, 000 TCP ports › Need one per

Shared Port › Problem: only ~ 60, 000 TCP ports › Need one per shadow › Shared port Service h*Doesn’t work with standard universe* USE_SHARED_PORT = true (default in 8. 5. 1) › Open single port in firewall › Changes sinful string to <192. 168. 1. 100: 9618? sock=xxx_yyy> 14

schedd Internet 15 Fire wall condor_shared_port startd shared_port starter

schedd Internet 15 Fire wall condor_shared_port startd shared_port starter

CCB: Condor Connection Broker › Bypasses firewalls by reversing connection › Requires one machine

CCB: Condor Connection Broker › Bypasses firewalls by reversing connection › Requires one machine with no firewall h. Usually the collector › Doesn’t work with standard universe › Only bypasses one firewall h. Usually in front of the startds h. Schedds / Central managers w/o firewalls (or firewall with single hole for shared port) 16

CCB: Condor Connection Broker schedd Internet 17 Outbound firewall CCB startd schedd

CCB: Condor Connection Broker schedd Internet 17 Outbound firewall CCB startd schedd

CCB Configuration › CCB built into condor_collector CCB_ADDRESS = $(COLLECTOR_HOST) PRIVATE_NETWORK_NAME = domain ›

CCB Configuration › CCB built into condor_collector CCB_ADDRESS = $(COLLECTOR_HOST) PRIVATE_NETWORK_NAME = domain › Machine behind same firewall can communicate directly 18

IPv 6 › IPv 6 -only mode h. ENABLE_IPV 6 = true h. ENABLE_IPV

IPv 6 › IPv 6 -only mode h. ENABLE_IPV 6 = true h. ENABLE_IPV 4 = false › Network parameters work as before h. NETWORK_INTERFACE = 2607: f 388: 1086: 0: 21 b: 24 ff: fedf: b 520 19

IPv 4/IPv 6 Mixed Mode ENABLE_IPV 4 = True (default) ENABLE_IPV 6 = True

IPv 4/IPv 6 Mixed Mode ENABLE_IPV 4 = True (default) ENABLE_IPV 6 = True (default in 8. 5. 3) › Both interfaces advertised, IPv 6 preferred › Central managers and submit machines › › must support both Execute machines can be IPv 4 -only or IPv 6 -only Ease transition to IPv 6 h. PREFER_IPV 4 = true 20

Putting it all together › CCB works with shared port h. Common Combination ›

Putting it all together › CCB works with shared port h. Common Combination › If you have CCB or shared port, probably › don’t need highport/lowport CCB works together with private networks h. Can be big performance win 21

Multi-Stage Routing <192. 168. 1. 55: 9618? CCBID=173. 194. 46. 96: 80#381%3 F sock%3

Multi-Stage Routing <192. 168. 1. 55: 9618? CCBID=173. 194. 46. 96: 80#381%3 F sock%3 D 917_aa 8 b_3& sock=1567_808 b_3> CCB shared_port schedd shared_port startd 22

Thank you! 23

Thank you! 23