Accessing GPU heap


Is it correct that the host cannot access memory allocated on the GPU heap by a kernel calling malloc? The CUDA Guide doesn’t mention anything, but given all the other restrictions, it wouldn’t surprise me.

The following code fails at cudaMemcpy on line 27 on a K80:

#include <cstdio>
#include <cassert>

__device__ int *ptr = NULL;

__global__ void kernel()
    if ((threadIdx.x==0 && threadIdx.y == 0 && threadIdx.z == 0) &&
        (blockIdx.x==0 && blockIdx.y == 0 && blockIdx.z == 0)      ) {
        ptr = (int*)malloc(4);
        *ptr = 42;

int main(int argc, char *argv[])
    assert( cudaSuccess == cudaGetLastError() );

    int *hptr = NULL;
    assert( cudaSuccess == cudaMemcpyFromSymbol(&hptr, ptr, 4, 0) );

    printf("ptr = %p\n", hptr);

    int val;
    assert( cudaSuccess == cudaMemcpy(&val, hptr, 4, cudaMemcpyDeviceToHost) );

    printf("*ptr = %d\n", val);

    return 0;


Yes, that is correct. Memory (pointers) created using in-kernel malloc or new are not directly usable in any of the host API cudaMemcpy type functions. The “device heap” (what in-kernel malloc accesses) is a logically separate space from the space that is accessed via cudaMalloc.

Some indication of the fact that these spaces are logically separate is given in the relevant programming guide section, along with a partial caution about interoperability with the host API, but I agree the description could be clearer: