How to use adaptive routing in IB subnet

I’m trying to measure throughput performance in our IB HDR testbed when adaptive routing is enabled.

I received the following message from our administrator which confirms AR is enabled.

==========================================================================================================

Master SM: Port=1 LID=1420 GUID=0x88e9a4ffff2332f6 devid=4123 Priority:15 Node_Type=CA Node_Description=ufm1 HCA-3

Standby SM: Port=1 LID=1246 GUID=0x88e9a4ffff1ffba8 devid=4123 Priority:10 Node_Type=CA Node_Description=agpu1301 HCA-1

Standby SM: Port=0 LID=197 GUID=0xb8cef6030076cbca devid=54000 Priority:8 Node_Type=SW Node_Description=MF0;IBGPUDR1:MCS8500/S03/U1

Adaptive Routing is enabled on 192 switches

==========================================================================================================

I understand that, to benefit from AR, it is necessary to provide some UCX parameters when launching MPI programs as follows:

UCX_IB_AR_ENABLE=yes, UCX_IB_SL=auto

My questions are …

(1) Is that all I need to do to test AR? Or, do I miss something?

(2) What will happen if UCX_IB_AR_ENABLE=no is given when the subnet is configured to use AR?

Will the throughput be degraded because all the out-of-order packets are simply discarded at the destination nodes?

Thank you for your reply in advance.

Hi Jongwook Lee,

Thank you for posting your inquiry to the Mellanox community.

  1. If the SM is able to enable Adaptive Routing on all the switches, this is the status output that will be reported.

Please review the following article for additional testing/validation methods that can be used:

https://community.mellanox.com/s/article/How-To-Configure-Adaptive-Routing-and-SHIELD-New

[Section 6.1; ‘Adaptive Routing validation’]

  1. This will probably just fail the job - however, this depends on whether Adaptive Routing is enabled on all SLs or not.

The UCX documentation has a table which describes behavior under all of these circumstances:

https://docs.mellanox.com/display/HPCXv281/Unified+Communication+-+X+Framework+Library

[Section ‘Adaptive Routing’, second yellow box]

Thanks again, and best regards;

Mellanox Technical Support