FPGA cannot communicate with A100 through XDMA Using RDMA

NVIDIA_USER_X1 · May 27, 2024, 7:35am

Hi, all:

I am developing a project that applies GPUDirect RDMA technology.
My system is：
FPGA: XCKCU15P
GPU: NVIDIA A100
Server: Lenovo SR658
System:Centos
CUDA:11.2

FPGA and A100 are mounted on the same PCIe bridge.

When the FPGA sends a read request to the GPU through XDMA, The PCIe bridge(c9:02.0) will immediately reply with an error message, which is displayed as UR, CA,CSR.

The RDMA driver was developed based on “jetson-rdma-picoevb-master”, with the script selected as “build-for-pc-native. sh”.The requested space address is 0xCC000600000, which is just within the BAR1 space of the GPU.

And the bridge also covers this part of the space.

Why? Is it because of issues with the chipset? Or is it a driver issue? I think it is correct for the driver to obtain the GPU address 0xCC000600000 in GPU BAR space by calling nvidia_p2p_get_pages and nvidia_p2p_dma_map_pages.
Can anyone help me? I would greatly appreciate it and look forward to your reply.

Thanks,
Yours Yang

NVIDIA_USER_X1 · May 27, 2024, 7:54am

The RDMA doc shows:

How to check if GPUDirect RDMA can be performed between two devices?
nvidia-smi topo -p2p w
or others？

NVIDIA_USER_X1 · May 28, 2024, 1:17am

Please help, experts. The project is a bit urgent. If it is a problem with the chipset, we will consider replacing the server. Looking forward to your reply

xiaofengl · May 29, 2024, 5:39am

I don’t think this forum can give you answer just through such describe. If you want build a workable P2P RDMA driver for GPU, you need work with NVIDIA GPU expert.

And, there is an other option,

NVIDIA open source driver can support linux kernel DMA-BUF now, you can use kernel DMA-BUF access GPU memory.

https://www.kernel.org/doc/html/latest/driver-api/dma-buf.html

NVIDIA_USER_X1 · May 29, 2024, 7:13am

Thanks for your reply.
My project needs to implement GPUDirect RDMA.
What additional information do I need to provide to determine this issue?Which forum should I mention to？
I am a newbie, please help me.
Thank you!

system · June 12, 2024, 7:14am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
GPUDirect RDMA:FPGA cannot communicate with A100 through XDMA GPU - Hardware rdma-and-roce	0	172	May 29, 2024
GPUdirect RDMA with NVIDIA A100 for PCIe DGX User Forum cuda , a100 , rdma-and-roce	1	2309	June 17, 2022
GPUDirect RDMA performance CUDA Programming and Performance	2	2175	March 26, 2013
GPUDirect RDMA with NVIDIA A100 and A40 CUDA Setup and Installation cuda , a100 , rdma-and-roce	0	1150	September 13, 2022
Question about RDMA and Resizable-BAR on the RTX4090 CUDA Programming and Performance	3	1204	May 29, 2024
GPU to GPU direct data transfer with connectX and RDMA RDMA Software For GPU ubuntu , python , gpu , infiniband	2	51	May 19, 2025
GPU2FPGA transfer rate is lower than FPGA2GPU when using GPUDirect RDMA CUDA Programming and Performance	6	1355	May 27, 2022
GPUDirectRDMA enabled GPUs CUDA Programming and Performance	8	3643	November 8, 2019
GPUDirect RDMA on Jetson Orin (nvidia_p2p_dma_map_pages) Jetson AGX Orin gpu	13	2743	November 16, 2022
RDMA Questions RDMA Software For GPU	4	973	December 20, 2023

FPGA cannot communicate with A100 through XDMA Using RDMA

Related topics