ask for help with weird " unspecified launch failure"

hi guys, I am running a kinetic monte carlo simulation on a GTX480.

In my code, I have two kernel functions that are called in a for loop.

if I set the for loop to be 1000 times, everything is OK,
but if I set it to be 10000 times, I got " unspecified launch failure " after I launch the kernel.

Anybody has any clue about this weird problem?

Thanks!

hi guys, I am running a kinetic monte carlo simulation on a GTX480.

In my code, I have two kernel functions that are called in a for loop.

if I set the for loop to be 1000 times, everything is OK,
but if I set it to be 10000 times, I got " unspecified launch failure " after I launch the kernel.

Anybody has any clue about this weird problem?

Thanks!

You have probably hit the watchdog timer that is supposed as a last line of defense against having runaway GPU code lock up your computer.
The easiest way around it would be to just distribute work between multiple kernel invocations that each are short enough to run. This gives the GPU a chance to update the screen between kernel invocations and resets the watchdog timer.

You have probably hit the watchdog timer that is supposed as a last line of defense against having runaway GPU code lock up your computer.
The easiest way around it would be to just distribute work between multiple kernel invocations that each are short enough to run. This gives the GPU a chance to update the screen between kernel invocations and resets the watchdog timer.

But why the first 1000 iterations are OK? I suppose the time for each iteration is similar, so the kernels are short enough to run.

But why the first 1000 iterations are OK? I suppose the time for each iteration is similar, so the kernels are short enough to run.

Sorry, I thought the loop was inside the kernel.

No good idea of what goes wrong then. Are there any data-dependent array accesses that might only go wrong in later iterations? Any more complicated device functions called that might involve device-side memory allocations?

Sorry, I thought the loop was inside the kernel.

No good idea of what goes wrong then. Are there any data-dependent array accesses that might only go wrong in later iterations? Any more complicated device functions called that might involve device-side memory allocations?

Actually, there is a loop in my kernel, but the loop times are random. If I set a limitation for the loop to be like 5 times, the error is gone. So I believe your suggestion is right. I want to ask if there is any way to get rid of this GPU kernel running time limitation?

Actually, there is a loop in my kernel, but the loop times are random. If I set a limitation for the loop to be like 5 times, the error is gone. So I believe your suggestion is right. I want to ask if there is any way to get rid of this GPU kernel running time limitation?