WEIRD cudaMemcpy error

Please help me, I don’t know what I’m doing wrong.

I’m using the MACRO HANDLE_ERROR to check where my program is not working and it seems that some of the cudaMemcpy calls are not working 'cuz I’m getting this error:

“unspecified launch failure in X.cu at line 198”

and that line is :

HANDLE_ERROR( cudaMemcpy( id, dev_id, sizeof(int), cudaMemcpyDeviceToHost ) );

the weird part is that I’ve already reserve memory for both id and dev_id before that cudaMemcpy call here is a part of my code:


int id;
id = (int
)malloc(sizeof(int));

HANDLE_ERROR( cudaMalloc( (void**)&dev_id, sizeof(int) ) );

recognize<<<blocks,threads>>>(dev_red, dev_redVec, dev_cmp, dev_con, dev_ptr, dev_id);

HANDLE_ERROR( cudaMemcpy( id, dev_id, sizeof(int), cudaMemcpyDeviceToHost ) );
HANDLE_ERROR( cudaMemcpy( MRED, dev_red, *tamRed * sizeof(int), cudaMemcpyDeviceToHost ) );
HANDLE_ERROR( cudaMemcpy( MREDvec, dev_redVec, *tamRedV * sizeof(int), cudaMemcpyDeviceToHost ) );

I’m getting the same error in the other two calls of cudaMemcpy.

Please help me, I don’t know what I’m doing wrong.

“unspecified launch failure” means the recognize() kernel failed at run-time, most likely because of an out-of-bounds memory access. The reason this is reported at the time cudaMemcpy() is called is because kernel launches are asynchronous, and runtime errors therefore are reported at the next host/device synchronization point. Try putting a cudaThreadSynchronize() call after the kernel call so the kernel errors get reported in close proximity to the kernel invocation, this makes error reporting less confusing.

To locate the source of the ou- of-bounds memory access in the kernel, try running a debug build of your program with cuda-memcheck.

thanks, and sorry I’m new in the forum, I solve it… in my kernel I was accessing and modifying dev_red more than *tamRed * sizeof(int) …

Thanks again.