I am currently responsible for a project that requires real-time processing of large amounts of data on personal computers. Our PC runs Ubuntu 22.04 operating system, and the data is collected from an FPGA acquisition card and then needs to be transferred to a GeForce RTX 4090 graphics card for processing. Our initial approach was to first read the data from the FPGA acquisition card into the CPU memory, and then transfer it to the GPU memory for processing. However, this approach inevitably introduces some latency.
When researching solutions, I learned about NVIDIA’s GPUDirect RDMA technology, which allows third-party PCIe devices to write data directly into the GPU’s RAM through the PCIe bus without going through the CPU memory. I have several questions regarding this technology:
-
Based on the information I found online, it appears that GPUDirect RDMA technology is only compatible with Tesla and Quadro series GPUs. Does this mean my RTX 4090 graphics card cannot utilize this technology?
-
I also discovered the Resizable BAR feature online, which, with support from the BIOS, GPU, and drivers, allows the CPU to access the entire GPU RAM space. Many GPUs already support this feature. I am not very familiar with Resizable BAR, but if my motherboard supports and enables this feature, would the CPU be able to access the full GPU RAM space of the RTX 4090? If so, could I use DMA technology to directly transfer data from the FPGA acquisition card to the RTX 4090’s RAM without going through PC memory?
-
If the second method mentioned above is feasible, what are the differences between it and the GPUDirect RDMA technology? If it is not feasible, what recommended methods are there to directly transfer data from the FPGA acquisition card to the memory of the RTX 4090 without going through PC memory?