unknown error from cudaMemCpy Get cuda unknown error for unknown reason

Hello community.
I’m totally out of thoughts what is the reason, maybe you’ll be able to help.
So in my app I do

int* d_finalPoints;
cudaMalloc( (void**)&d_finalPoints , MEMORYFORRESULTINGARRAY );
int* points = (int*)malloc( MEMORYFORRESULTINGARRAY );

kernel call

cudaMemcpy( (void*)points, d_finalPoints , MEMORYFORRESULTINGARRAY, cudaMemcpyDeviceToHost );

It results in a driver crash for less than a second and then the last call returns cuda unknown error.
I’ve looked through this forum and it’s been said that it can be the problem of bad pointers, still all pointers are good and MEMORYFORRESULTINGARRAY is rather small and can be allocated normally.
If anyone knows what is going on please answer. Thanks in advance.

Do error check for all cuda related calls and the GPU kernel as well.

All the cuda calls are wrapped in CHECK_ERROR macro and after each of kernel calls there’s an error check.
Until the line where I copy array to CPU everything is good, no error messages. And in debugger I see that memory is allocated normally both on CPU and GPU.
I’ve also I’ve tried to minimize my app into just a single kernel call with fixed params. Still the same. Any ideas?

Insert [font=“Courier New”]cudaThreadSynchronize();[/font] before you check errors after the kernel launch, so that you also catch errors during the runtime of your kernel.

Thanks, that worked, and helped to find the real error. It’s


during the kernel launch.

My kernel function accepts

(int* points, int x, int y, int z,int offset)

I’m pretty sure that there’s no problem with single integer values, guess the real problem is in the int* points array.

So I’ve made the size of points array just 1024 bytes, to be sure it would have space to be allocated.

Then I call

int* d_points;

CUDA_CHECK_ERROR( cudaMalloc( (void**)&d_points , MEMORYFORRESULTINGARRAY ) );

and it doesn’t give any error messages.

Then I run extremely simple kernel which should fill this array with ones and it also crashes with the same error.

What can be the reason?

Can you post the extremely simple kernel?

Are you sure that error isn’t coming from a cudaMemcpy BEFORE the kernel call?

Shouldn’t you be passing the kernel d_finalPoints and not points? You do a memcpy from d_finalPoints back to points when the kernel itself doesn’t seem to use d_finalPoints at all.

I don’t think we’re seeing enough of the code.

Hello again, thank you all for the replies. Eventually I’ve found error.

So here it is:

Earlier in the same code I copied some input arrays. Something like

cudaMemcpy( (void**)&d_initArray, initArray, allocationLength, cudaMemcpyHostToDevice )

(void**)&d_initArray,, somehow I’ve added & and that was a real problem. There was no error in memory allocation and due to this is C there was no problem in running this func.

Anyhow this call was wrapped in CUDA_CHECK_ERROR macro,which gave no error message and the crash happened much later. What helped was to put cudaThreadSynchronize() after this call and only then get last error. Then it gave message about failure.

Thank you all for the help :)