Nested "for" in a device function


There’s no limit on the number of nested for loops. You probably have an indexing-out-of-bounds problem (perhaps in your texture accesses). Use debugging techniques like the method described in the answer here:

to prove or disprove this theory, isolate the out-of-bounds access to a specifc line of code, and then begin investigating your indexing in more detail.

I guess, it’s something like timeout.

Is there any timeout settings in CUDA for a single device function?
It’s just looked like only the number of iterations plays some kind of role.

Yes, there are timeout settings on both windows and linux.

There is a sticky thread right at the top of this forum that discusses the windows timeout settings: