Error When Running dpa-all-to-all Demo

Hi,

I am trying running the /opt/mellanox/doca/applications/dpa-all-to-all demo. However, When I tried to specify running dpa-all-to-all in RDMA devices mlx5_0 and mlx5_1, it dropped a new error saying tha t mlx5_0 doesn’t exist or doesn’t support RDMA:

I guess that it is because there no RDMA functionality in mlx5_0 and mlx5_1. I explored the RDMA devices in BF3, using show_gids, it shows:

It is interesting that there are no mlx5_0 and mlx5_1.

When I tried to specify the RDMA device to mlx5_2 and mlx5_3, I encountered another bug that saying “Failed to create doca dev uar”, as shown in the figure below:

I found a very similar issue in the forum: [BlueField-3 DPA] Unable to Query GID for mlx5_0 When Using FlexIO - Infrastructure & Networking / BlueField - NVIDIA Developer Forums, this issue also encountered a problem that cannot found mlx5_0 and mlx5_1 in BF3. I have tried the solution provided by sribhargavid, but it shows that my BF3 has already in the mode of DPU:

1

Now I have no idea why I cannot run the dpa-all-to-all demo. Sincerely appreciate any advice that can help.

Here’s the details of my system:

Host system: Ubuntu 22.04 kernel version 5.15.0-142-generic.
DOCA Version: 2.9.2005
BF3: BlueField-3 B3220 P-Series FHHL DPU
BSP Version: bf-bundle-2.9.2-31_25.02_ubuntu-22.04_prod.bfb

Another information that may help: when I tried ibv_devinfo in BF3, it shows some wired output for mlx5_0 and mlx5_1:


*very many phys_ports, totally 83 for mlx5_0 and mlx5_1.

But for mlx5_2 and mlx5_3, they looks normal:

I am looking forward for your advices. Thanks a lot.

Hi yzhangtc,

1.As we can see the error “Operation not permitted” while using mlx5_2/mlx5_3, please try to add the “–allow-run-as-root” follows “mpirun”.

2.Based on other error messages, this seems a fw issue, please try to upgrade the FW version from 32.43.2566 to 32.43.3608(or newer) and try again.

3.Regarding the mlx5_1 contains 83 devices under ibv_devinfo command output, I think this might be the SF(sub-function) device of this PF port inside your DPU arm OS, could you help to check the PF’s configuration inside your DPU OS?