The GPU kernel talks to the device driver directly to initiate RDMA.
The GPU kernel talks to a CPU thread on the host shared memory. The host thread in turn utilize UCX or Verbs directly to initiate RDMA on behalf of the kernel.
There exists GPU device side RDMA library for initiating and NVSHMEM relies on it? The RDMA library talks to the device driver directly?
I wonder which one is true. Is this information even publicly available?
Thank you for responding. Do you by any chance has a link to that? I can’t seem to find it, maybe because I am new to the forum. Also, even if I can see the code of the device driver, it doesn’t tell me how NVSHMEM is interacting with it right? Does NVSHMEM interact with the device driver directly?
I think I figured out the answer to my question. According to this paper
NVSHMEM at least back in 2020 was having GUP threads talk to the GPU progress threads which process the GPU initiated SHMEM requests by invoking verbs. The SHMEM runtime is largely managed by the CPU and it act as an intermediary. I suppose the GPU threads and host threads use CUDA shared memory to talk to each other.