[SOLVED] GPU Direct RDMA - nvidia_p2p_get_pages returns -EINVAL

KMS · March 14, 2017, 12:29pm

Hello,

I am modifying an acquisition board Linux driver in order to make the acquisition board transfer the data directly to my GTX 1080. For that, I use the GPU Direct RDMA technology described in the CUDA Toolkit v8.0.

Badly, when the driver calls nvidia_p2p_get_pages in order to retrieve physical addresses for the buffer the application allocated on the GTX 1080 memory, the function return -EINVAL (-22).

I really don’t understand why this function return -EINVAL.

The first two parameters (p2p_token and va_space_token) are 0. I don’t want to use the deprecated tokens and the same process allocate the buffer and call the driver calling nvidia_p2p_get_pages.
virtual_address parameter is the virtual address the application received from cudaMalloc. This address is 64 KiB aligned.
length parameter is the size of the allocated buffer (2 * 1024 * 1024 bytes)
page_table parameter is the address of a (nvidiat_p2p_page_table *) variable initially NULL as in the sample I found.
free_callback parameter is the address of a function
data parameter is the address of my context structure

Do y have to configure the NVIDIA driver in some “mode” in order to use the GPU Direct RDMA technology?

Do you know something that could make nvidia_p2p_get_pages fail and returns -EINVAL?

Any hint will be greatly appreciated.

Thanks in advance,
Martin

vacaloca · March 14, 2017, 12:41pm

From the documentation, it seems like this feature is usable on Tesla or Quadro GPUs only:

http://docs.nvidia.com/cuda/gpudirect-rdma/index.html

See also:
https://devtalk.nvidia.com/default/topic/716091/geforce-gtx-780-and-gpudirect-rdma/

KMS · March 14, 2017, 2:00pm

Thanks for the quick answer. I will try to get Tesla board.