unspecified launch failure: This error is in cudaMemcpy

Hello everyone

I think this problem may be obvious but i am not able to make out the reason for that.

In my code, I have a kernel launch followed by cudaThreadsynchronize() and then cudaMemcpy();

When I try to run the program I get following error:

Cuda error in file ‘CudaPoissonEquation1DMultipleBlock01SingleKernel.cu’ in line 352 : unspecified launch failure.

In that line cudaMemcpy() is there.

There is no problem with the arguments of this call.

When i looked for the reason on net, some one said that this error is equivalent to segmentation fault on CPU and they recommended to dig into the kernel.

I have been digging into the kernel, but I could not find any reason (All memory access are well in the bound).

If some one tells me possible reasons for this error, that will be great help to me.

Thanks a lot.

Hi Praveen,

Just a try. Are your allocations not too big for your card memory ?


Google this newsgroup - you’ll find that your kernel has probably crashed because of invalid access to a memory region and

the reason it fails on cudaMemcpy is because you didnt check for error code after your kernel run.


Hi eyal,
Which newsgroup are you talking about ? I’ld like very much to have access to it, if possible.

Matt, just google those forums, nVidia’s :)

both the “Cuda programming and development” and the “General CUDA GPU Computing Discussion”

there are tons of questions like the one you’ve posted

Hello everyone

Thanks Jatukam and eyal for your replies.
I have been looking into my kernel last few days to locate the problem but i was not successful.
My specifications are not big for my card, i am using Teslac1060.
All my memory accesses are well in bound.

I have attached the file with this, please look into that and tell me what is the problem.
Really i am not able to figure it out.

It is 3 point stencil computation to solve Poisson equation.
UOld and UNew are two arrays, UNew updates its values using UOld.
After each updation of its values it checks whether next iteration is required or not.
The norm has been defined as maximum of difference value between UOLd and Unew.

Please tell me what is the problem. please do not mind if the problem is too silly.

Thanks a lot

With regards
sample.cu (12 KB)

While compiling this file we need to set flag -arch sm_13 (atomic operations are being used)

nvcc -arch sm_13 sample.cu

it gives some warnings after compiling which can be ignored.

when i try to run it (It takes two inputs: first one size of the array and second one maximum number of iterations) it gives the following error.

[b]Cuda error in file ‘sample.cu’ in line 265 : unspecified launch failure.


If somebody finds the fault with the kernel, please tell me. it will be great help to me.


Your code tries to write to two uninitialized pointers in a number of places. Neither BlockCount nor BlockWait are allocated when you are using them in the device code.

Hello avidday

Thanks a lot. Initially Blockcount and Blockwait were int variables and for some purpose i made them pointers and forgot to allocate for them.

Thanking you very much once again.

As a hint for the future - check out Ocelot. It is excellent for detecting that sort of problem. I compiled your code and linked it with ocelot, and in 15 seconds had the place in the ptx where the errors were occurring. Another minute to look at the source, and problem solved.