"--use_cuda_dmabuf" is not supported on this GPU

k.bodi2 · July 17, 2023, 4:03pm

Hey there,

we are currently testing the perftools samples and trying to run GPUDirect RDMA. Besides the CLI option “–use_cuda” which works great, we tried additionally the CLI option “–use_cuda_dmabuf” but the following is returned after starting ib_send_bw:

$ ./ib_send_bw -F -d mlx5_0 -a --report_gbits --use_cuda=0 --use_cuda_dmabuf 20.4.3.219
Perftest doesn't supports CUDA tests with inline messages: inline size set to 0
initializing CUDA
Listing all CUDA devices in system:
CUDA device 0: PCIe address is 22:00

Picking device No. 0
[pid = 3746, dev = 0] device name = [NVIDIA RTX A5000]
creating CUDA Ctx
making it the current CUDA Ctx
DMA-BUF is not supported on this GPU
 Couldn't create IB resources

Two systems are P2P connected via ConnectX-4 NICs using RoCE. My questions are:

What is “–use_cuda_dmabuf” actually doing? What is the benefit in comparison to using just “–use_cuda”?
Why is our GPU not supported and which GPUs are supported at all?

We saw in perftools that the following line is checking a required attribute:
cuDeviceGetAttribute(&is_supported, CU_DEVICE_ATTRIBUTE_DMA_BUF_SUPPORTED, cuDevice)

Can you please help and explain this CUDA attribute? I couldn’t find any more specific informations online about this. Thanks

k.bodi2 · July 18, 2023, 1:30pm

I appreciate any help or advice on this topic.

k.bodi2 · July 20, 2023, 9:48am

So far I think I was able to explain to myself what DMA_BUF actually is.

DMA_BUF allows to register memory regions directly on peer devices from the HCA perspective, e.g. for GPUDirect RDMA on VRAM of a GPU, instead of using the system memory. GPUDirect RDMA is a “closed” source solution, so one has access to this feature if all prerequisities are fulfilled (GPU, NIC, Software).

From my understanding, this is identical to Device Memory Programming which should be supported for ConnectX-5 and above. If one wants to utilize GPUDirect RDMA (which is somehow based on DMA_BUF), one can even use lower types of ConnectX, e.g. ConnectX-4.

What I still don’t understand is the line
cuDeviceGetAttribute(&is_supported, CU_DEVICE_ATTRIBUTE_DMA_BUF_SUPPORTED, cuDevice)
which checks the attribute of the GPU. We are using ConnectX-4, so I’m fine with this, that DMA_BUF may not work. But why does CUDA tell me, that my GPU is not supported, though it is a RTX A5000 which seems appropriate for all these features from my point of view?

Levei_Luo · July 26, 2023, 8:54am

hi,
GPUDirect Over DMA-BUF is a new feature from OFED 5.8-1.0.1.1

some information about this:
Added support for GPUDirect support over dma-buf. As such, using the new mechanism nv_peer_mem is no longer required.

The following is required for dma-buf support:
Linux kernel version 5.12 or later
OpenRM version 515 or later

Perftest support was added as well:
Default option in perftest is without dmabuf. To run with this option, add --use_cuda_dmabuf in addition to use_cuda flag.

k.bodi2 · July 31, 2023, 11:36am

Alright, I know 90% of all the informations you have shared with me, though the software requirement of OpenRM were new to me.

Is there any way to check whether all requirements are fulfilled? The kernel version is okay, but I’m not sure about OpenRM. Is it included automatically in the CUDA driver?

PS: It seems like OpenRM is part of the CUDA Toolkit.

Topic		Replies	Views
RDMA GPUDirect//nvidia-peer-memory/cuda issue RDMA Software For GPU software-and-drivers , howto-enable-verify-and-troubleshoo	11	1985	September 12, 2019
Error when trying to write data to GPU DMA memory (using GPU Direct RDMA) Jetson AGX Xavier pcie , kernel , fpga	8	1490	May 30, 2023
GPUDirect question - cudaDeviceCanAccessPeer information CUDA Programming and Performance	9	4294	January 2, 2020
linux-rdma perftest ib_read_bw failure with use_cuda option Software And Drivers iterations , bytes	0	1594	June 17, 2021
Having issues getting host gpu to host gpu RDMA to work CUDA Programming and Performance	2	1840	July 17, 2019
Cuda-gdb doesn't break and/or step into Kernels CUDA Programming and Performance	26	53751	August 1, 2011
GPUDirect RDMA at the ibverbs level. Software And Drivers iterations , bytes	4	1532	November 30, 2020
cuPointerSetAttribute error(CUDA_ERROR_NOT_SUPPORTED) with CUDA virtual memory management API CUDA Programming and Performance cuda	4	238	May 22, 2024
Can't use Cuda-gdb CUDA-GDB	7	4575	November 28, 2022
Trying to get GPUdirect RDMA working. CUDA Setup and Installation	2	1605	April 10, 2014

"--use_cuda_dmabuf" is not supported on this GPU

Related topics