Hello!
I try to run a reduction kernel from a for-loop, which loops several thousand times.
If this construct only loops 100 times, everything works fine. With 3000 loops I get nan values as output ( but it works in emulation mode and shows correct results ).
Is it possible to run too much kernels?
I don’t get cuda errors by calling cudaGetLastError().
Best regards!
edit: i wrote a new kernel, which isn’t called in a loop but the problem is remaining. So this topic can be closed, because it is no problem because of “too much kernels”.