EXAScaler LNet MultiRail Amir Shehata Agenda What is

  • Slides: 20
Download presentation
EXAScaler LNet Multi-Rail Amir Shehata

EXAScaler LNet Multi-Rail Amir Shehata

Agenda ►What is LNet Multi-Rail ►Use Case Example ►Benefits ►Configuration whamcloud. com

Agenda ►What is LNet Multi-Rail ►Use Case Example ►Benefits ►Configuration whamcloud. com

Prior to Multi-Rail o 2 ib 0 o 2 ib 1 Node o 2

Prior to Multi-Rail o 2 ib 0 o 2 ib 1 Node o 2 ib 3 o 2 ib 4 whamcloud. com

With Multi-Rail o 2 ib 1 Node whamcloud. com

With Multi-Rail o 2 ib 1 Node whamcloud. com

With Multi-Rail o 2 ib 1 Node A Node B whamcloud. com

With Multi-Rail o 2 ib 1 Node A Node B whamcloud. com

With Multiple LNets o 2 ib Node A Node B o 2 ib 1

With Multiple LNets o 2 ib Node A Node B o 2 ib 1 whamcloud. com

What is Multi-Rail (MR) and Health ►Allows the grouping of multiple interfaces in the

What is Multi-Rail (MR) and Health ►Allows the grouping of multiple interfaces in the same LNet network (ex o 2 ib) ►Enables LNet to use all the interfaces which are part of the same network ►Enables LNet to use multiple networks to reach peers on the same networks ►Enables LNet to monitor the health of networks and interfaces and use the healthiest available. whamcloud. com

With Multi-Rail o 2 ib 1 Node A Node B whamcloud. com

With Multi-Rail o 2 ib 1 Node A Node B whamcloud. com

Without MR MGS MGT MDS MDT To use all DGX-2 interfaces, each interface will

Without MR MGS MGT MDS MDT To use all DGX-2 interfaces, each interface will need to be configured in a separate LNet network On the OSS nodes, aliases are used to connect a single interface to multiple LNet networks DGX-2 OSS OST whamcloud. com

With MR MGS MGT MDS MDT OSS OST Multi-Rail LNet allows for the LNet

With MR MGS MGT MDS MDT OSS OST Multi-Rail LNet allows for the LNet network configuration to match the fabric. DGX-2 whamcloud. com

Dual Fabric With Dual Multi-Rail LNet Networks MGS MGT MDS MDT OSS OST DGX-2

Dual Fabric With Dual Multi-Rail LNet Networks MGS MGT MDS MDT OSS OST DGX-2 OSS OST In this example there are two fabrics, each with an LNet network on top. The server nodes connect to both fabrics. The DGX-2 client connects with multiple interfaces to both fabrics. The other client nodes connect to only one fabric. whamcloud. com

MR Benefits ►Ease of Configuration ►Increased throughput Grouping all the interfaces available on the

MR Benefits ►Ease of Configuration ►Increased throughput Grouping all the interfaces available on the servers and on the clients aggregates the interfaces’ throughput Even if there are multiple networks, MR uses all networks therefore aggregating the throughput of each whamcloud. com

Interface selection ►Enhanced interface selection based on: Health of the interface GPU Priority NUMA

Interface selection ►Enhanced interface selection based on: Health of the interface GPU Priority NUMA closeness Available credits Round Robin if all are equal whamcloud. com

DGX-2/AI-400 read performance with MR 80 70 60 50 BWCPU 40 BWGPU 30 20

DGX-2/AI-400 read performance with MR 80 70 60 50 BWCPU 40 BWGPU 30 20 10 0 1 G 4 K 1 G 8 K 1 G 16 K 1 G 32 K 1 G 48 K 1 G 64 K 1 G 128 K 1 G 256 K 1 G 512 K 1 G 1 M 1 G 4 M 1 G 16 M whamcloud. com

Configuring MR ►MR is on by default ►Automatic peer interface discovery allows configuration free

Configuring MR ►MR is on by default ►Automatic peer interface discovery allows configuration free MR ►Multiple interfaces need to be configured ►Ex: Using lnetctl: lnetctl net add –net o 2 ib –if ib 0, ib 1 ►Ex: Using modprobe: options lnet networks=“o 2 ib(ib 0, ib 1)” ►Automatic peer discovery can be turned off: lnetctl set discovery 0 Not recommended whamcloud. com

Health Benefits ►Enhanced network failure detection ►Enhanced network interface health monitoring ►Enhanced interface selection

Health Benefits ►Enhanced network failure detection ►Enhanced network interface health monitoring ►Enhanced interface selection based on interface health ►Message send retries for network failures to avoid lustre level failure recovery whamcloud. com

When to use Health ►Health is most useful when the server or client are

When to use Health ►Health is most useful when the server or client are setup with multiple interfaces and/or networks ►This will allow LNet to retry messages on other interfaces with better health whamcloud. com

Health Parameters The time to wait for an LNet message response before failing the

Health Parameters The time to wait for an LNet message response before failing the message ► transaction_timeout: The number of times to retry sending an LNet message before propagating the failure to upper layer ► retry_count: ► health_sensitivity: The value on failure amount by which to reduce an interface’s health ►Depending on the size of the cluster and the performance of the network, transaction_timeout and retry_count might need to be adjusted to account for network latency whamcloud. com

Conclusion ►MR brings the benefit of aggregating interface throughput. It should be used whenever

Conclusion ►MR brings the benefit of aggregating interface throughput. It should be used whenever possible ►MR is on by default. Multiple interfaces need to be configured to take advantage of it. ►Health should be used when the node has multiple interfaces and/or networks configured. ►Depending on the size of the cluster, the health parameters will need to be tweaked for optimal performance. whamcloud. com

Thank You http: //doc. lustre. org/lustre_manual. xhtml

Thank You http: //doc. lustre. org/lustre_manual. xhtml