unspecified launch failure only when break from a for loop

my kernel has a for-loop


for(int i=0;i<rows;i++)



if(x==0) break;



Although divergent code is not suggested by CUDA, but the break statement achieves 2X speedup in my case.

The program works fine for many data sets for many days. however, recently it reported a “unspecified launch failure” error for some data set.

The most strange thing is, if I comment out the “if(x==0) break” line, the program will run correctly.

So what is the possible reason ?


just a try… could x be evaluated to NaN because of some specific data set and give origin to such a behaviour? I really have never investigated what is the CUDA behavior with NaNs.

Thanks for the reply.

The x is read from a array in global memory, which are all assigned to positive integers or 0. NaN is impossible.

Rather than using break try adding the condition to your for loop like:

[codebox]for(int i = 0; i < rows && x != 0; i++)[/codebox]

I had the same problem and this seemed to fix it.

Thanks for the suggestion, I tried it, unfortunately the problem is still there.

Are you using a synchronization barrier in the code? If so, making just the one thread exit can keep the sync from happening.


Try using CUDA-gdb, or running in device emulation mode to identify where the problem is being created, or to check the values of x.

No, I don’t use __syncthreads() in the loop.

The key is that : if I comment out the if…break line, the program will run correctly and give right results.

and the x is use in other lines of code, without any problem.

So the data should be OK.

I think the problem is come from program logic or compilier side.