Python Cupy and DirectGPU

Dear All,
I would like to kindly ask you some information regarding an application that we are developing. I hope I am writing in the correct forum, as the topic cover various areas.
We have a Xilinx FPGA that acquire a set of images and we would like to transfer these images directly into the GPU memory with DirectGPU for subsequent elaboration with Python Cupy library based on CUDA.
We have found this application principle for the transfer between FPGA to GPU memory using DirectGPU:

I found that Cupy initializes the GPU array with cudaMalloc and then returns the virtual memory pointer. In addition, seems to be possible to use cp.cuda.UnownedMemory(ptr, size, owner=None) to access a memory region not allocated by Cupy. However, since the virtual memory pointer is required, my main concern is whether the RDMA works with virtual memory or physical memory, and whether it will be possible to obtain the virtual memory pointer to manage the data with Python Cupy?
Our idea is to transfer a single block of data; does RDMA expect to work with a single block or chunk?
How do RDMA works?

Do you have any suggestions for our application?
Thank you very much in advance for your time and help.
Kind regards
Alessandro

Have you read NVIDIA’s documentation?

Hi njuffa,
thank you very much for your reply.
I am reading the documentation, but I still have some questions.

It seems to be possible to use cudaMalloc, cuPointerSetAttribute and the nvidia_p2p_get_pages function to get the physical address where to write the data and then I can pass the cudaMalloc pointer to Cupy.

However, as far as I know, the pages returned by p2p_get_pages are not contiguous. Is it is possible to allocate contiguous pages?
Also, I don’t understand how the GPU memory is exposed on the PCI BAR.

From the document, as far as I understood, the page size is 64kB. Is it possible to change the page size?

Thank you very much!