How to memcpy in linux driver directly to GPU?


I have a 3rd party device that is filling a linux kernel buffer with data. I would like to (in the driver) copy that data directly to GPU.

I think is sort of like gdrcpy except that I want to do the memcpy in the driver.

My first attempt was to:

  1. User Space: Allocate GPU memory using cudaMalloc and set the CU_POINTER_ATTRIBUTE_SYNC_MEMOPS attribute
  2. Linux Kernel Space: Pin that pointer memory using nvidia_p2p_get_pages
  3. Linux Kernel Space: Use the page physical addresses provided from step 2 to memcpy(gpu_pinnned_page, kernel_cpu_pinned_page, size)

This memcpy fails spectacularly. (entire server requires a hard power cycle.)

What step did I miss? I can add the mmap from gdrcopy but is it only for user space? Or is there some pointers created in the mmap call that I can also use to do the memcopy in the driver?