cudaMemcpy error "invalid argument" from in-kernel malloc'ed device mem buffer on cuda 4

fji · February 10, 2012, 5:02am

Hi,

I use to have a piece of code work well on CUDA 4.0. But after using CUDA 4.1 official release, I found cudaMemcpy from a device memory buffer, which is allocated within a kernel using dynamic memory, back to host memory buffer returns an “invalid argument” error.

I wrote a simple example program to show this problem as in the following: (also attached)

#define SIZE 1024

#define BYTE_C 87

#include <stdlib.h>

#include <stdio.h>

__global__ void alloc_kernel(char **buf_ptr, size_t size, char content) {

    if(threadIdx.x == 0 && blockIdx.x == 0) {

        *buf_ptr = (char *) malloc(size);

        for(int i = 0; i < size; i++) {

            (*buf_ptr)[i] = content;

        }   

    }   

}

#define CUDA_CALL(cmd) do { \

    cudaError_t err; \

    err = cmd;       \

    if( err != cudaSuccess ) {       \

        printf("%s line %d: CUDA Error: %s\n", __FILE__, __LINE__, cudaGetErrorString(err)); \

    }               \

}while(0)

int main() {

    char **d_bufptr, *h_bufptr, *h_buf;

CUDA_CALL( cudaSetDevice(0) );

    CUDA_CALL( cudaMalloc((void **) &d_bufptr, sizeof(char *) ) );

alloc_kernel<<<1, 1>>>(d_bufptr, SIZE, BYTE_C);

CUDA_CALL( cudaDeviceSynchronize() );

CUDA_CALL( cudaMemcpy(&h_bufptr, d_bufptr, sizeof(char *), cudaMemcpyDeviceToHost) );

    h_buf = (char *)malloc(SIZE);

    CUDA_CALL( cudaMemcpy(h_buf, h_bufptr, SIZE, cudaMemcpyDeviceToHost) );

return 0;

}

In the line where I do cudaMemcpy near the end, that is the place this error is returned. The same error returns for page-locked host buffer.

Can anyone give me a hint on where I did anything wrong? Or is this a bug in the latest driver/CUDA release? Thanks!

My system: Intel Xeon E5507, GTX 480. CUDA 4.1 toolkit. Ubuntu 11.04. Kernel version 2.6.38-13-generic. Driver NVIDIA-Linux-x86_64-285.05.33. This problem is also seen in another CentOS 5.5 system (Intel Xeon E5560, M2070. CUDA 4.1, kernel 2.6.18-194.el5.perfctr).
simple_memcpy_test.cu (972 Bytes)

fji · February 17, 2012, 4:41pm

Hi, does someone verify my code’s problem?

To give more information on this, this code works on cuda 4.0 (runtime + driver), but not with cuda 4.1 (runtime + driver).
Even if I use cuda 4.0 runtime with a 4.1’s driver, it still returns the same error. Does this mean there is driver bug somewhere in there?

Thanks!

L_F · February 17, 2012, 8:30pm

Check syntax of cudaMemcpy, you are using h_bufptr in this code the different ways - as an address of a pointer and as a pointer.

fji · February 19, 2012, 6:10pm

Hi, Thanks, but I don’t get what you mean.

h_bufptr, defined as a char pointer, is a host variable, whose value is a device memory address.

Therefore, in

CUDA_CALL( cudaMemcpy(&h_bufptr, d_bufptr, sizeof(char *), cudaMemcpyDeviceToHost) );

The device buffer address stored in d_bufptr is transferred into h_bufptr. After that, this device buffer address can be used on the host.

Then, in

CUDA_CALL( cudaMemcpy(h_buf, h_bufptr, SIZE, cudaMemcpyDeviceToHost) );

The device buffer address stored in h_bufptr is used to transfer the in-kernel malloc’ed device buffer to the host side into h_buf.

Could you explain how this contradicts the syntax of cudaMalloc? Thanks!

L_F · February 22, 2012, 1:12am

Sorry, didn’t check your code thoroughly. It can be a mix of host/device pointers along with explicitly defined direction for cudaMemcpy. Have you tried to allocate h_buf via cudaHostAlloc and copy with default direction?

fji · February 23, 2012, 7:11pm

Nope.
It returns the same error message whether the host buffer is allocated through malloc, cudaMalloc, or cudaHostAlloc (with all different flags).
And it is the same case for cudaMemcpy with cudaMemcpyDefault parameter. (I guess the error message will be a different one, perhaps Invalid Direction if this direction parameter is the problem).

Do you have a CUDA 4.1 environment to verify this bug? Thanks!

L_F · February 24, 2012, 9:18pm

Yes, probably there is some bug in CUDA driver, I checked with run-time API (“Invalid argument”) and with driver API (“global function call is not configured”). But as a workaround you may use cudaHostAlloc for h_buf and then “memcpy” inside a kernel to copy from *d_bufptr to h_buf, it works correctly.

fji · February 28, 2012, 2:07am

I never know there is a “memcpy” function that can be called in a kernel.

Could you point to the relevant sections in the CUDA programming guide about this “memcpy” function?

L_F · February 28, 2012, 3:20pm

There is no mention in “CUDA C Programming Guide” about memcpy (probably because it’s not an extention), but compiler handles it as well as other usual C-functions (e.g. memset).

Topic		Replies	Views
cudaMemcpy returns "invalid argument" for in-kernel malloc-ed memory CUDA Programming and Performance	0	1417	February 13, 2013
Invalid Argument after calling cudaMalloc on device but not host CUDA Programming and Performance	1	1839	March 30, 2017
cudaMemcpy: invalid argument issue CUDA Programming and Performance	2	17346	September 5, 2011
strange problem accessing device memory cudaMalloc and cudaMemcpy CUDA Programming and Performance	0	2320	April 2, 2010
cudaMemcpy fails CUDA Programming and Performance	3	12923	July 2, 2012
Memcpy gives cudaErrorInvalidValue on pinning memory CUDA Programming and Performance	0	427	November 20, 2017
cudaMemcpy failed with invalid argument CUDA Programming and Performance	16	205	August 27, 2025
Problem with cudaHostAlloc Problem with Memcpy CUDA Programming and Performance	6	2978	July 2, 2012
Problem with memcpy and vbo registering CUDA Programming and Performance	2	869	July 31, 2010
cudaMemcpy returns invalid value CUDA Programming and Performance	2	10317	March 12, 2015

cudaMemcpy error "invalid argument" from in-kernel malloc'ed device mem buffer on cuda 4

Related topics