Xilinx Ultrascale --> RTX 4000 RDMA sample code

Hi, Is there any sample code (git, open source etc.) available for enabling RDMA from Xilinx Ultrascale FPGA (e.g. XCKU085) to RTX 4000? Initially, we are transferring 2.5 GB (GigaBytes) / second and will eventually need to get to 10 GB/s. Has anybody tested such large transfers, and what are some of the issues one need to watch out for? Our current implementation transfers the data to a circular buffer on server and reads it back into the GPU. The obvious issues we are facing are CPU interrupts and packet drops. This was done purely to get the application working. But, the next step is to optimize the flow. Thanks.