Memory transfer error

Hi all-

When I run my code under cuda-memcheck, I get the following:

rho device to host memory copy failure: unspecified launch failure
========= Invalid global write of size 8
========= at 0x000058c0 in
========= by thread (0,0,0) in block (0,0,0)
========= Address 0x01001000 is out of bounds

========= ERROR SUMMARY: 1 error

Not sure what to make of this. I have employed this type of memory transfer successfully before. The relevant portion of the code is:

int L = 100;
int length = 2*(L+2);
double R[length], rho[length], *rho_dev;

/* memory allocation /
stat = cudaMalloc((void
*)&rho_dev, (length)sizeof(double));
if( stat != cudaSuccess ){
printf(“device memory allocation failure: rho\n”);
write to device */
stat = cudaMemcpy(rho_dev, rho, (length)*sizeof(double), cudaMemcpyHostToDevice);
if( stat != cudaSuccess ){
printf(“rho host to device memory copy failure\n”);

/* Calculation */

/* Get memory from device… this is where I get the error from cuda-memcheckk */
stat = cudaMemcpy(R, rho_dev, (length)*sizeof(double), cudaMemcpyDeviceToHost);
if( stat != cudaSuccess ){
printf(“rho device to host memory copy failure: %s\n”, cudaGetErrorString(cudaGetLastError()));


Since your code seems fine for the part you showed, I suspect the error you get comes from the kernel itself. As explained chapter 3.2.8 of the CUDA C programming guide

The rest of the chapter explains how to check for pre-launch errors and kernel errors for a kernel launch. I would encourage you to do so.

Here, most likely, you have a memory violation that occurs inside you kernel, and which is only reported once you check for error status of cudaMemcpy. Moreover, the cuda-memcheck’s message gives you a code line and a reason for the error. Did you check the corresponding line (line 1523 in file

Hi and Thanks,

I found the error. It was an array referencing an out of bounds index in the kernel. But this brings me to another question. There is no line 1523 in my code. It’s about 600 lines. How do I interpret this error location reference?

TBH, and AFAIK, cuda-memcheck only gives (hopefully) accurate line numbers and file indications on codes compiled in debug mode. Were you using “-G -g” while compiling? If not, what does cuda-memcheck give you in this case for the erroneous code?

I compiled with nvcc -g -G -arch sm_20

The CUDA toolchain processes source code through multiple stages of intermediate files, and uses #line directives in generated intermediate C++ code and .loc directives in PTX to point back to the lines in the original source code. It seems like there may be an issue with that mechanism in your case, where the line number stated in the error message could refer to the line number in one of the intermediate files, rather than in the original source.

Since you have a repro case in hand, it would be a good ides to file a bug should this erroneous line number occur with either the CUDA 4.1 toolchain or the CUDA 4.2RC toolchain.