we are currently testing the perftools samples and trying to run GPUDirect RDMA. Besides the CLI option “–use_cuda” which works great, we tried additionally the CLI option “–use_cuda_dmabuf” but the following is returned after starting ib_send_bw:
$ ./ib_send_bw -F -d mlx5_0 -a --report_gbits --use_cuda=0 --use_cuda_dmabuf 20.4.3.219
Perftest doesn't supports CUDA tests with inline messages: inline size set to 0
initializing CUDA
Listing all CUDA devices in system:
CUDA device 0: PCIe address is 22:00
Picking device No. 0
[pid = 3746, dev = 0] device name = [NVIDIA RTX A5000]
creating CUDA Ctx
making it the current CUDA Ctx
DMA-BUF is not supported on this GPU
Couldn't create IB resources
Two systems are P2P connected via ConnectX-4 NICs using RoCE. My questions are:
What is “–use_cuda_dmabuf” actually doing? What is the benefit in comparison to using just “–use_cuda”?
Why is our GPU not supported and which GPUs are supported at all?
We saw in perftools that the following line is checking a required attribute: cuDeviceGetAttribute(&is_supported, CU_DEVICE_ATTRIBUTE_DMA_BUF_SUPPORTED, cuDevice)
Can you please help and explain this CUDA attribute? I couldn’t find any more specific informations online about this. Thanks
So far I think I was able to explain to myself what DMA_BUF actually is.
DMA_BUF allows to register memory regions directly on peer devices from the HCA perspective, e.g. for GPUDirect RDMA on VRAM of a GPU, instead of using the system memory. GPUDirect RDMA is a “closed” source solution, so one has access to this feature if all prerequisities are fulfilled (GPU, NIC, Software).
From my understanding, this is identical to Device Memory Programming which should be supported for ConnectX-5 and above. If one wants to utilize GPUDirect RDMA (which is somehow based on DMA_BUF), one can even use lower types of ConnectX, e.g. ConnectX-4.
What I still don’t understand is the line cuDeviceGetAttribute(&is_supported, CU_DEVICE_ATTRIBUTE_DMA_BUF_SUPPORTED, cuDevice)
which checks the attribute of the GPU. We are using ConnectX-4, so I’m fine with this, that DMA_BUF may not work. But why does CUDA tell me, that my GPU is not supported, though it is a RTX A5000 which seems appropriate for all these features from my point of view?
hi,
GPUDirect Over DMA-BUF is a new feature from OFED 5.8-1.0.1.1
some information about this:
Added support for GPUDirect support over dma-buf. As such, using the new mechanism nv_peer_mem is no longer required.
The following is required for dma-buf support:
Linux kernel version 5.12 or later
OpenRM version 515 or later
Perftest support was added as well:
Default option in perftest is without dmabuf. To run with this option, add --use_cuda_dmabuf in addition to use_cuda flag.
Alright, I know 90% of all the informations you have shared with me, though the software requirement of OpenRM were new to me.
Is there any way to check whether all requirements are fulfilled? The kernel version is okay, but I’m not sure about OpenRM. Is it included automatically in the CUDA driver?
PS: It seems like OpenRM is part of the CUDA Toolkit.