Runtime Error in DOCA UROM

Hello:

I’m posting here to ask for advice on an issue I’ve been facing while running a DOCA-based UCX application on a BlueField-3 DPU.
The following error occurs during execution:

[1761260065.615737] [urom-daemon-bf:536101:0] ib_mlx5.c:592 UCX ERROR mlx5dv_devx_alloc_uar(device=mlx5_2, flags=0x0) type=WC failed: Cannot allocate memory. Consider increasing PF_LOG_BAR_SIZE using mlxconfig tool (requires reboot)

Following the message, I used the mlxconfig tool to increase PF_LOG_BAR_SIZE from the default value 5 to 8 and rebooted the system, but the same error still occurs.
I also tried increasing the value of PF_BAR2_SIZE, but it did not help.


Environment

  • Platform: NVIDIA BlueField-3

  • DOCA: 3.1.0

  • UCX: 1.19.0

  • OS: Ubuntu 22.04

  • Kernel: 5.15.0-1074-bluefield

  • Driver (OFED): MLNX_OFED_LINUX-25.07-0.9.7


What I have tried

  • Verified that PF_LOG_BAR_SIZE=8 using
    mlxconfig -d <device> query | grep PF_LOG_BAR_SIZE

  • Rebooted the system, but the same error persists

  • Increased PF_BAR2_SIZE, but no change

  • Other parameters (NUM_OF_UARS, LOG_BAR_SIZE, etc.) remain at their default values


Additional context

This issue occurs while running the DOCA UROM sample applications:
urom_multi_workers_bootstrap_sample and worker_graph.

When I set the number of workers to 2, the application runs successfully with the default PF_LOG_BAR_SIZE=5.
However, when I increase the number of workers to 4, the above error appears.
Even after increasing PF_LOG_BAR_SIZE to 8, the same error still occurs.


Questions

  1. What other possible causes could lead to this error?

  2. Are there additional parameters (e.g., PCI BAR resource limits, UAR allocation) that should be adjusted besides PF_LOG_BAR_SIZE and PF_BAR2_SIZE?

  3. Has anyone experienced a similar issue with UCX on BlueField-3? Any suggestions or workarounds would be greatly appreciated.


Would you like me to slightly reformat it (for example, to match NVIDIA Developer Forums style with Markdown headings and code formatting)? It’ll make it clearer when you post it there.

I was unable to edit the original topic, so I posted the same content again at the following link:

Hi yosei0107,

Thank you for posting your inquiry to the NVIDIA Developer Forums.

PF_LOG_BAR_SIZE is the log2 of the size (in MB) of the UAR BAR for the physical function. The default is 5 (32 MB), and the maximum is 63 MB.

Increasing PF_LOG_BAR_SIZE and VF_LOG_BAR_SIZE will consume more BAR resources. If the total requested BAR resources exceed what is available on the card or the PCI root bus, allocation will fail.

If there are many VFs or a high number of requested resources (qp, cq, ctx, pd), you may need to increase VF_LOG_BAR_SIZE as well.

It is important to check the available PCIe BAR resources on the root bus using tools like lspci and dmesg to ensure you are not exceeding hardware limits.

Recommendations
Verify PCIe BAR resource availability: Use lspci -t and dmesg | grep to check the PCI root bus and available BAR resources.

Consider tuning additional parameters: If increasing PF_LOG_BAR_SIZE and PF_BAR2_SIZE does not help, review other parameters such as NUM_OF_UARS, LOG_BAR_SIZE, and VF_LOG_BAR_SIZE, especially if using SR-IOV or many VFs.

Check for system-wide resource limits: In some cases, system limits like ulimit -l (max locked memory) can also cause allocation failures and should be set to ‘unlimited’ for high-performance workloads.

From there, since this is an issue related to a DOCA sample application, we recommend contacting the DOCA mailing list (at DOCA-Feedback@exchange.nvidia.com) for further assistance.

Best regards,
NVIDIA Enterprise Experience

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.