Crashes - display driver recovers Cuda program causes card to give up.

Hello all,
I recently discovered that I could run more than 65535 blocks at a time by using a second dimension, e.g.

dim3 grid(65535, 12);
TestKernel<<<grid, 1>>>(dev_testArray);

instead of
TestKernel<<<65535, 1>>>(dev_testArray);

This meant that I could run more blocks with less work for each thread (12x more blocks for 12x less work per thread)
When I had originally developed this program, I noticed that the graphics card would crash (driver successfully recover message) if I ran more than about 12 or 15 permuations per thread (512 threads per block), apparently because the thread would take to long, at least that is what people told me.
This is why I was so happy to discover I could run more blocks, so each thread should run for a shorter period of time, however I now get the same crash with using 12x more blocks instead of running each thread 12x longer. (I get the feeling the overall program is actually much less efficient now for various reasons).

For a look at the code, you can download it here:
I put it into two folders so that you can see the code that worked and the code that doesn’t. There maybe just a logical error that is causing problems (or many errors combined).
I am hoping that by being able to run many more blocks, I should be able to push past 12 node TSP to 13+.

Any help appreciated.


The watchdog timer kicks in if a kernel takes too long for a full grid, not for an individual thread. So it does not matter how you divide the work between threads and blocks.

An easy way to run more blocks without triggering the watchdog is to divide them between multiple kernel launches (i.e., instead of launching one grid of 65535×3 blocks, launch three grids of 65535 blocks each).

Yeah you could try disabling the watchdog ;) :)

Ooooh i really like this idea.

I found this thread about the timeout issue and they mentioned the ‘no monitor’ plugged in concept, which I have already tried. I have a crappy ati + nvidia gtx260 so i can still see.

Can someone confirm whether I need to manually disable the windows driver thing or whether having no monitors plugged in should be good enough.

Ideally I would like to be able to run a cuda application or kernel for more than 15 seconds, probably more like a 1+ minutes…

I will try splitting it up over several kernels for now thank you.

Best wishes,


I have since put some more time into this and followed the instructions on this website:

it works! it turns out that it was indeed an issue with the watchdog and unplugging the monitors from that card is not enough and you do have to change the win7 registry no matter what.

What I found odd was that the old program that works with the watchdog on is actually a fraction slower (0.45% according to the compute visual profiler). In that program the threads ran for 15x longer instead of having 15x as many blocks. I had thought that running the threads for a longer time, as I am already maxing out the blocks/threads that can run on the gfx card, would be if anything faster, not slower.

Anyways problem solved. Thanks guys.