Bluefield-3 cannot allocate memory for ucx when trying out dpa_all_to_all

Hello, I’m trying to run NVIDIA’s DOCA dpa_all_to_all sample( NVIDIA DOCA DPA All-to-all Application Guide - NVIDIA Docs ) on BlueField-3 ARM SoC, but it fails during MPI_Init when UCX tries to initialize.

Running the program on the host CPU works nice, but they don’t run when I try it on the BF-3 ARM SoC.

Platform / software

  • HW: NVIDIA BlueField-3 (running on ARM SoC)

  • DOCA: 3.2.1025

  • UCX: 1.20.0

  • MLNX_OFED: 25.10-1.7.1

What I have tried
ulimit unlimited

Tried export UCX_IB_MLX5_DEVX_UAR=n, export UCX_IB_MLX5_DEVX=n

Checked RDMA sanity

Failure log

mpirun -np 2 ./doca_dpa_all_to_all
[2026-01-26 05:03:53:920797][2667169888][DOCA][INF][doca_log.cpp:633] DOCA version 3.2.1025
[2026-01-26 05:03:53:922579][903727200][DOCA][INF][doca_log.cpp:633] DOCA version 3.2.1025
[1769403834.150458] [localhost:1925108:0]        ib_iface.c:1315 UCX  ERROR mlx5_2: iface 0xb76343ef3500 failed to create UD QP TX wr:256 sge:6 inl:64 resp:0 RX wr:4096 sge:1 resp:0 failed: Cannot allocate memory
[1769403834.151192] [localhost:1925108:0]      ucp_worker.c:1415 UCX  ERROR uct_iface_open(ud_verbs/mlx5_2:1) failed: Input/output error
[localhost.localdomain:1925108] pml_ucx.c:314  Error: Failed to create UCP worker
[1769403834.166861] [localhost:1925107:0]        ib_iface.c:1363 UCX  ERROR mlx5_2: ibv_create_cq(cqe=256) failed: Invalid argument
[1769403834.166945] [localhost:1925107:0]      ucp_worker.c:1415 UCX  ERROR uct_iface_open(ud_verbs/mlx5_2:1) failed: Input/output error
[localhost.localdomain:1925107] pml_ucx.c:314  Error: Failed to create UCP worker

Hi dragonj5290

Since you have tried to disable the DEVX, can you also try to ensure the ulimit take effect?

export UCX_IB_MLX5_DEVX=n
export UCX_IB_MLX5_DEVX_UAR=n
ulimit -l unlimited 

you can also run ulimit -l to check whether it’s unlimited and run mpirun again.

if you still encounter a failing run, check below log

dmesg | egrep -i 'mlx5_cmd_check|ALLOC_UAR|limits exceeded|Cannot allocate memory' | tail -n 50

if you can see messages such as limits exceeded/ALLOC_URA failed, it can clearly indicate HCA side UAR resource exhaustion, then you can consider modifying the URA-related configuration by mlxconfig command.

regards

xyin

Thank you for your answer.

However, I have tried them only to run into the exact same error as before.

Strangely, dmesg command showed nothing even with the sudo.

Hey Dragonj5290

Thanks for the update,

for ensuring the ulimit takes effect, can you check whether mpirun has applied the change?

mpirun -np 1 bash -c 'echo "memlock: $(ulimit -l)"; ulimit -a'

besides, can you see any logs like mlx5 like

dmesg | grep mlx5

thanks

xyin

mpirun -np 1 bash -c ‘echo “memlock: $(ulimit -l)”; ulimit -a’
memlock: unlimited
real-time non-blocking time (microseconds, -R) unlimited
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 31415
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 31415
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Dear xyin

Thank you for your constant help.
The results seems fine but I’m still struggling to run the program.

Thank you