How to access dma_alloc_coherent memory from CUDA?

I have cameras DMAing data into memory pre-allocated by usb_alloc_coherent() in drivers/usb/core/devio.c. I want to get that data into GPU for postprocessing in a zero-copy way.

I tried to use cudaHostRegister(cudaHostRegisterIoMemory) to get a GPU device pointer out of the memory, but failed. It reported either “not supported” or “out of memory” when I used cudaHostRegisterDefault flag, even though cudaHostRegister works for regular malloc memory.

There was a similar post previously referring to 12_camera_v4l2_cuda but that sample doesn’t really show how to map dma_alloc_coherent memory to EGL interface (or if by dma_alloc_coherent memory you meant dmabuf_fd created from NvBufferCreateEx is dma coherent then the USB DMA memory isn’t created from the same source). Can you elaborate? Or is this something not working yet?

You can test it with this code:

// g++ -lusb-1.0 -I/usr/local/cuda/include -L/usr/local/cuda/lib64 -lcuda -lcudart
#include <libusb-1.0/libusb.h>
#include <cstdio>
#include <cuda.h>
#include <cuda_runtime_api.h>

int main() {
  auto* handle = libusb_open_device_with_vid_pid(nullptr, 0x1d6b, 0x0003);
  size_t size = 4096;
  auto* mem = libusb_dev_mem_alloc(handle, size);
  printf("%p\n", mem);
  cudaError_t err;
  err = cudaHostRegister(mem, size, cudaHostRegisterIoMemory);
  if (err) {
    printf("cudaHostRegister: %s\n", cudaGetErrorString(err));
  return 0;


cudaHostRegister cannot be used for the DMA buffer.
As you said, the sample is for the DMA buffer created from NvBufferCreateEx.

Let us checking this with our internal team.
Will update more information with you later.



Sorry for keeping you waiting.

Could you help to check the memory allocated with dma_alloc_coherent is cacheable or non-cacheable?
If the buffer exposed to CUDA is cacheable, the mapping should work fine.