I’m trying to measure throughput performance in our IB HDR testbed when adaptive routing is enabled.
I received the following message from our administrator which confirms AR is enabled.
==========================================================================================================
Master SM: Port=1 LID=1420 GUID=0x88e9a4ffff2332f6 devid=4123 Priority:15 Node_Type=CA Node_Description=ufm1 HCA-3
Standby SM: Port=1 LID=1246 GUID=0x88e9a4ffff1ffba8 devid=4123 Priority:10 Node_Type=CA Node_Description=agpu1301 HCA-1
Standby SM: Port=0 LID=197 GUID=0xb8cef6030076cbca devid=54000 Priority:8 Node_Type=SW Node_Description=MF0;IBGPUDR1:MCS8500/S03/U1
Adaptive Routing is enabled on 192 switches
==========================================================================================================
I understand that, to benefit from AR, it is necessary to provide some UCX parameters when launching MPI programs as follows:
UCX_IB_AR_ENABLE=yes, UCX_IB_SL=auto
My questions are …
(1) Is that all I need to do to test AR? Or, do I miss something?
(2) What will happen if UCX_IB_AR_ENABLE=no is given when the subnet is configured to use AR?
Will the throughput be degraded because all the out-of-order packets are simply discarded at the destination nodes?
Thank you for your reply in advance.