Hi,
Sorry for this newbie question.
Why the kernel function can not work?
threads/block = 32, blcoks/grid = 12,
kernel_function:
{
gIndex = blockDim.x * blockIdx.x + threadIdx.x;
if( gIndex > 10 )
algorithm_1 …
else
algorithm_2 …
}
The error information is: the launch timed out and was terminated.
but if changed to “( gIndex >= 32 )”, the code could work normally,
Is that the threads in one block must run with the same algorithm?
Most likely, you have a race condition or other interdependence of threads that is causing the problem.
Generally speaking, what you are trying to do should “work” in the sense it should produce correct results. But if any warp (i.e. 32 sequentially numbered threads within a block, regardless of block size) diverges (meaning some of the warp does one thing while the rest does something else), then the processor executes the two cases sequentially.
If there is no interdependence of threads, then the fact that divergent threads are run sequentially instead of in parallel should have no effect (besides performance). But if they do depend on each other, for example if you have __syncthreads() or access to shared memory, then it will create a problem.