Need Help Resolving DOCA_RDMA and RDMA Issues on DPU & Can't use ibstat in dpu side

I’ve been tasked with testing DOCA_RDMA between the DPU and the host, but we’ve hit a snag and could really use some insights from the community.

We are encountering an error with DOCA_RDMA that says: “input/output operation failed”. Additionally, we are unable to use ibstat on the ARM side, which returns the error: “ibpanic: [413572] main: stat of IB device ‘mlx5_0’ failed: No such file or directory”. Attempts to implement standard RDMA also didn’t result in successful data transfer.

Moreover, certain ib commands work when opening two processes on the same machine but fail when trying to communicate between the main machine and the host. Here’s an example of what happens:

DPU side:

vbnetCopy code

ubuntu@localhost:~$ ibv_rc_pingpong -d mlx5_2 -p 20005
local address: LID 0x0000, QPN 0x000d86, PSN 0xadb1c6, GID ::
Failed to modify QP to RTR
Couldn't connect to remote QP

Host side:

rubyCopy code

xilinx_0@xilinx_0:~/guzhongyi/webServer $ ibv_rc_pingpong -p 20005 192.168.100.2
local address: LID 0x0000, QPN 0x000047, PSN 0x8ed157, GID ::
client read/write: No space left on device
Couldn't read/write remote address

For context, here’s our system info:

rubyCopy code

ubuntu@localhost:~$ uname -a
Linux localhost.localdomain 5.15.0-1032-bluefield #34-Ubuntu SMP Thu Nov 16 11:07:45 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

We installed using bfb (DOCA_2.5.0_BSP_4.5.0_Ubuntu_22.04-1.23-10.prod.bfb) and our DOCA tools version is: doca-dpu-repo-ubuntu2204-local_2.5.0107-1.23.10.1.2.0.0.bf.4.5.0.12993_arm64.deb.

Any guidance, advice, or suggestions would be immensely appreciated. We’re eager to resolve these issues and proceed with our project

Hello,

Please make sure that the DPU Firmware version is also aligned and updated to the latest LTS - NVIDIA Networking Firmware Downloads
Please choose the relevant device model, and try to install the latest FW.

If the issue still occurs once all versions are aligned, please open a case at: enterprisesupport@nvidia.com, and it will be handled according to entitlement.

Best Regards,
Jonathan.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.