There’s no limit on the number of nested for loops. You probably have an indexing-out-of-bounds problem (perhaps in your texture accesses). Use debugging techniques like the method described in the answer here:
to prove or disprove this theory, isolate the out-of-bounds access to a specifc line of code, and then begin investigating your indexing in more detail.
I guess, it’s something like timeout.
Is there any timeout settings in CUDA for a single device function?
It’s just looked like only the number of iterations plays some kind of role.
Yes, there are timeout settings on both windows and linux.
There is a sticky thread right at the top of this forum that discusses the windows timeout settings: