Nested "for" in a device function

test

There’s no limit on the number of nested for loops. You probably have an indexing-out-of-bounds problem (perhaps in your texture accesses). Use debugging techniques like the method described in the answer here:

http://stackoverflow.com/questions/27277365/unspecified-launch-failure-on-memcpy

to prove or disprove this theory, isolate the out-of-bounds access to a specifc line of code, and then begin investigating your indexing in more detail.

I guess, it’s something like timeout.

Is there any timeout settings in CUDA for a single device function?
It’s just looked like only the number of iterations plays some kind of role.
javascript:void();

Yes, there are timeout settings on both windows and linux.

There is a sticky thread right at the top of this forum that discusses the windows timeout settings:

https://devtalk.nvidia.com/default/topic/459869/cuda-programming-and-performance/-quot-display-driver-stopped-responding-and-has-recovered-quot-wddm-timeout-detection-and-recovery-/