BlueField3(CX7) RoCE Adaptive Routing Configuration(adaptive_routing_forced_en) Issue

I’m experiencing an issue when trying to enable Adaptive Routing for RoCE on my NVIDIA BlueField3 NIC. Despite my configuration attempts, the adaptive_routing_forced_en value remains unchanged.

When I execute the mlxreg command, the adaptive_routing_forced_en value doesn’t change as expected:

root:/home/ubuntu# mlxreg -d /dev/mst/mt41692_pciconf0  --reg_name ROCE_ACCL --set "adaptive_routing_forced_en=0x1"
You are about to send access register: ROCE_ACCL with the following data:
Field Name                                     | Data    
============================================================
roce_adp_retrans_field_select                  | 0x00000001
roce_tx_window_field_select                    | 0x00000001
roce_slow_restart_field_select                 | 0x00000001
roce_slow_restart_idle_field_select            | 0x00000001
min_ack_timeout_limit_disabled_field_select    | 0x00000001
adaptive_routing_forced_en_field_select        | 0x00000000
selective_repeat_forced_en_field_select        | 0x00000001
dc_half_handshake_en_field_select              | 0x00000000
ack_dscp_force_field_select                    | 0x00000001
roce_adp_retrans_en                            | 0x00000001
roce_tx_window_en                              | 0x00000000
roce_slow_restart_en                           | 0x00000001
roce_slow_restart_idle_en                      | 0x00000000
min_ack_timeout_limit_disabled                 | 0x00000000
adaptive_routing_forced_en                     | 0x00000001
selective_repeat_forced_en                     | 0x00000000
dc_half_handshake_en                           | 0x00000000
ack_dscp_force                                 | 0x00000000
ack_dscp                                       | 0x00000000
============================================================

 Do you want to continue ? (y/n) [n] : y
 Sending access register...
root:/home/ubuntu# mlxreg -d /dev/mst/mt41692_pciconf0  --get --reg_name ROCE_ACCL
Sending access register...

Field Name                                     | Data    
============================================================
roce_adp_retrans_field_select                  | 0x00000001
roce_tx_window_field_select                    | 0x00000001
roce_slow_restart_field_select                 | 0x00000001
roce_slow_restart_idle_field_select            | 0x00000001
min_ack_timeout_limit_disabled_field_select    | 0x00000001
adaptive_routing_forced_en_field_select        | 0x00000000
selective_repeat_forced_en_field_select        | 0x00000001
dc_half_handshake_en_field_select              | 0x00000000
ack_dscp_force_field_select                    | 0x00000001
roce_adp_retrans_en                            | 0x00000001
roce_tx_window_en                              | 0x00000000
roce_slow_restart_en                           | 0x00000001
roce_slow_restart_idle_en                      | 0x00000000
min_ack_timeout_limit_disabled                 | 0x00000000
adaptive_routing_forced_en                     | 0x00000000
selective_repeat_forced_en                     | 0x00000000
dc_half_handshake_en                           | 0x00000000
ack_dscp_force                                 | 0x00000000
ack_dscp                                       | 0x00000000
============================================================

I notice that in the output, the “adaptive_routing_forced_en_field_select” value still is 0x00000000,.

Prior to this, I had already set ROCE_ADAPTIVE_ROUTING_EN=1 via mlxconfig:

root:/home/xiangzhou# mlxconfig -d /dev/mst/mt41692_pciconf0 -e q | grep -i ADAPTIVE
*       ROCE_ADAPTIVE_ROUTING_EN                    False(0)             True(1)              True(1)

Hardware, Firmware, and Driver Information:

root:/home/ubuntu# mlxconfig -d /dev/mst/mt41692_pciconf0 -e q

Device #1:
----------

Device type:        BlueField3          
Name:               900-9D3B6-00CN-A_Ax 
Description:        NVIDIA BlueField-3 B3240 P-Series Dual-slot FHHL DPU; 400GbE / NDR IB (default mode); Dual-port QSFP112; PCIe Gen5.0 x16 with x16 PCIe extension option; 16 Arm cores; 32GB on-board DDR; integrated BMC; Crypto Enabled
Device:             /dev/mst/mt41692_pciconf0


root@gpu15:/home/ubuntu# mlxburn -d /dev/mst/mt41692_pciconf0  query
-I- Image type:            FS4
-I- FW Version:            32.44.1036
-I- FW Release Date:       7.2.2025
-I- Product Version:       32.42.1000
-I- Rom Info:              type=UEFI Virtio net version=21.4.13 cpu=AMD64,AARCH64
-I-                        type=UEFI Virtio blk version=22.4.13 cpu=AMD64,AARCH64
-I-                        type=UEFI version=14.35.15 cpu=AMD64,AARCH64
-I-                        type=PXE version=3.7.500 cpu=AMD64
-I- Description:           UID                GuidsNumber
-I- Base GUID:             7c8c090300bf3484        38
-I- Base MAC:              7c8c09bf3484            38
-I- Image VSD:             N/A
-I- Device VSD:            N/A
-I- PSID:                  MT_0000000883
-I- Security Attributes:   secure-fw

root:/home/ubuntu# mlxreg --version
mlxreg, mft 4.31.0-149, built on Feb  7 2025, 12:26:42. Git SHA Hash: N/A

root:/home/ubuntu# ofed_info -s
OFED-internal-25.01-0.6.0:

root:/home/ubuntu# cat /opt/mellanox/doca/applications/VERSION 
2.10.0087

Could someone please advise on the correct procedure to enable Adaptive Routing for RoCE on the BlueField3? Are there additional steps required, or am I missing something in my current approach?

Thank you for your assistance.

Hi,
In general Adaptive Routing is supported with Spectrum-X solution only. At some point it was enabled in the past and I believe your device is provisioned in that way (based on the mlxconfig output). The firmware then fails the initialization and this is why eventually it is not being enabled.
You can clear the mlxconfig configuration with “mlxconfig reset” command to get to a normal state where it is not visible.
Regards,
Yaniv

Hi Yaniv,

Thank you for your response.

I’ve followed your suggestion and reset the BF3 with the following command:

sudo mlxconfig -d 40:00.0 -y reset

After cold rebooting the machine, I can confirm that the ROCE_ADAPTIVE_ROUTING_EN configuration has been reset to its default value (False):

sudo mlxconfig -d /dev/mst/mt41692_pciconf0 -e q | grep -i ADAPTIVE
        ROCE_ADAPTIVE_ROUTING_EN                    False(0)             False(0)             False(0)

Then I set LINK_TYPE_P1andLINK_TYPE_P2 to ETH and cold reboot my machine again.

After that, I tried to enable adaptive routing using mlxreg:

root:/home/ubuntu# sudo mlxreg -d 40:00.0 --reg_name ROCE_ACCL --set "adaptive_routing_forced_en=0x1"
You are about to send access register: ROCE_ACCL with the following data:
Field Name                                     | Data    
============================================================
roce_adp_retrans_field_select                  | 0x00000001
roce_tx_window_field_select                    | 0x00000001
roce_slow_restart_field_select                 | 0x00000001
roce_slow_restart_idle_field_select            | 0x00000001
min_ack_timeout_limit_disabled_field_select    | 0x00000001
adaptive_routing_forced_en_field_select        | 0x00000000
selective_repeat_forced_en_field_select        | 0x00000001
dc_half_handshake_en_field_select              | 0x00000000
ack_dscp_force_field_select                    | 0x00000001
roce_adp_retrans_en                            | 0x00000001
roce_tx_window_en                              | 0x00000000
roce_slow_restart_en                           | 0x00000001
roce_slow_restart_idle_en                      | 0x00000000
min_ack_timeout_limit_disabled                 | 0x00000000
adaptive_routing_forced_en                     | 0x00000001
selective_repeat_forced_en                     | 0x00000000
dc_half_handshake_en                           | 0x00000000
ack_dscp_force                                 | 0x00000000
ack_dscp                                       | 0x00000000
============================================================

 Do you want to continue ? (y/n) [n] : y
 Sending access register...
root:/home/ubuntu# sudo mlxreg -d 40:00.0 --get --reg_name ROCE_ACCL
Sending access register...

Field Name                                     | Data    
============================================================
roce_adp_retrans_field_select                  | 0x00000001
roce_tx_window_field_select                    | 0x00000001
roce_slow_restart_field_select                 | 0x00000001
roce_slow_restart_idle_field_select            | 0x00000001
min_ack_timeout_limit_disabled_field_select    | 0x00000001
adaptive_routing_forced_en_field_select        | 0x00000000
selective_repeat_forced_en_field_select        | 0x00000001
dc_half_handshake_en_field_select              | 0x00000000
ack_dscp_force_field_select                    | 0x00000001
roce_adp_retrans_en                            | 0x00000001
roce_tx_window_en                              | 0x00000000
roce_slow_restart_en                           | 0x00000001
roce_slow_restart_idle_en                      | 0x00000000
min_ack_timeout_limit_disabled                 | 0x00000000
adaptive_routing_forced_en                     | 0x00000000
selective_repeat_forced_en                     | 0x00000000
dc_half_handshake_en                           | 0x00000000
ack_dscp_force                                 | 0x00000000
ack_dscp                                       | 0x00000000
============================================================

But I found that adaptive_routing_forced_en is still 0x00000000.

Now my BF3 is connected to a SN2700 switch. May I clarify if you mean that Adaptive Routing is offically exclusively supported with Spectrum-X solutions, and that my BF3 detects the model of the switch it’s connected to, preventing the Adaptive Routing feature from being enabled if the switch is not a Spectrum-X series.

Thanks so much for taking the time to respond.

Regards,
Xiangzhou

“Now my BF3 is connected to a SN2700 switch. May I clarify if you mean that Adaptive Routing is offically exclusively supported with Spectrum-X solutions, and that my BF3 detects the model of the switch it’s connected to, preventing the Adaptive Routing feature from being enabled if the switch is not a Spectrum-X series.”

Yes