Problem with cudaHostAlloc Problem with Memcpy

Hi there,

I’ve a little problem by using cudaHostAlloc:

There is my code :

__device__ int addDevice( int a, int b ) {

    return a + b;


__global__ void add( int a, int b, int *c ) {

    *c = addDevice( a, b );


int main( void ) {

    int c;

    int *dev_c;

    HANDLE_ERROR( cudaHostAlloc( (void**)&dev_c, sizeof(int), cudaHostAllocDefault ) );

add<<<1,1>>>( 1, 9, dev_c );

HANDLE_ERROR( cudaMemcpy( &c, dev_c, sizeof(int),

                              cudaMemcpyDeviceToHost ) );

    printf( "1 + 9 = %d\n", c );

    HANDLE_ERROR( cudaFreeHost( dev_c ) );

return 0;


It seems that the problem comes from the Memcpy function : invalid argument.

Has anyone an Idea where the problem comes from?



dev_c is already on the host, so DeviceToHost doesn’t seem to make sense.

But dev_c represents the value of c int the kernel.

By writing deviceToHost, I am upgrading the value of dev_c with the value of c I obtained in the kernel. Or am I wrong?

You have allocated memory on the host instead of on the device?

But cudaHostAlloc is supposed to return a device pointer to memory that is pinned at the host side. So essentially any cudaMemcpy’s execute faster.

I can’t spot any mistake in the original poster’s code.

cudaHostAlloc() returns a host pointer, even if the [font=“Courier New”]cudaHostAllocMapped[/font] flag were specified (which isn’t in the example above). You still need to call cudaHostGetDevicePointer() to obtain the corresponding device pointer for mapped memory. Only under certain conditions (UVA) will these pointers be the same.

Ah, thanks for clarifying. it’s been a while since I last used this function.