cuMemAllocHost, how to use ?

Hi all,

just try to use the cuMemAllocHost function to allocate some memory on the host and then

access it from the device/running kernel. Something like this :

   int memSizeOnHost = (5 * 1024 * 1024);

    int * devPtrOnHost = NULL;

    CUDA_SAFE_CALL(cuMemAllocHost((void**)devPtrOnHost, memSizeOnHost));

   // call the kernel, pointer to the allocated memory as argument

    foo_kernel<<<NUM_BLOCKS, NUM_THREADS>>>(..,..,devPtrOnHost);

   // kernel

    __global__ static void ping_ping_kernel(..,.., int * devPtrOnHost) {


     // here try to access host memory, not working

     devPtrOnHost[xx] = some_int_value;


But it seems not to work. I’m a bit confused because i did not find any example which use the

cuMemAllocHost function. At the other side the programming guide explicitly saids that “… since the memory ca be accessed directly by the device”…

So, it should work…

Any ideas ?



The part of the document you quote is a bit misleading. The memory allocated is not paged, so it can be transferred to the device with a DMA (direct memory …) transfer. That is where the “direct” comes from. The allocated memory is not device memory and cannot be read by a kernel. You still need memcpy it over to the device. However, you should find that the memcpy happens faster than “normal” host memory.

The “bandwidthTest” sample in the SDK includes code demonstrating using pinned memory and cudaMallocHost.

thanks guys…