we are currently testing the perftools samples and trying to run GPUDirect RDMA. Besides the CLI option “–use_cuda” which works great, we tried additionally the CLI option “–use_cuda_dmabuf” but the following is returned after starting ib_send_bw:
$ ./ib_send_bw -F -d mlx5_0 -a --report_gbits --use_cuda=0 --use_cuda_dmabuf 126.96.36.199
Perftest doesn't supports CUDA tests with inline messages: inline size set to 0
Listing all CUDA devices in system:
CUDA device 0: PCIe address is 22:00
Picking device No. 0
[pid = 3746, dev = 0] device name = [NVIDIA RTX A5000]
creating CUDA Ctx
making it the current CUDA Ctx
DMA-BUF is not supported on this GPU
Couldn't create IB resources
Two systems are P2P connected via ConnectX-4 NICs using RoCE. My questions are:
What is “–use_cuda_dmabuf” actually doing? What is the benefit in comparison to using just “–use_cuda”?
Why is our GPU not supported and which GPUs are supported at all?
We saw in perftools that the following line is checking a required attribute: cuDeviceGetAttribute(&is_supported, CU_DEVICE_ATTRIBUTE_DMA_BUF_SUPPORTED, cuDevice)
Can you please help and explain this CUDA attribute? I couldn’t find any more specific informations online about this. Thanks
So far I think I was able to explain to myself what DMA_BUF actually is.
DMA_BUF allows to register memory regions directly on peer devices from the HCA perspective, e.g. for GPUDirect RDMA on VRAM of a GPU, instead of using the system memory. GPUDirect RDMA is a “closed” source solution, so one has access to this feature if all prerequisities are fulfilled (GPU, NIC, Software).
From my understanding, this is identical to Device Memory Programming which should be supported for ConnectX-5 and above. If one wants to utilize GPUDirect RDMA (which is somehow based on DMA_BUF), one can even use lower types of ConnectX, e.g. ConnectX-4.
What I still don’t understand is the line cuDeviceGetAttribute(&is_supported, CU_DEVICE_ATTRIBUTE_DMA_BUF_SUPPORTED, cuDevice)
which checks the attribute of the GPU. We are using ConnectX-4, so I’m fine with this, that DMA_BUF may not work. But why does CUDA tell me, that my GPU is not supported, though it is a RTX A5000 which seems appropriate for all these features from my point of view?