Too much threads makes computer crashing If this kernell takes a long time to complete, I got a blue

Hello everybody,

Here is a sample kernell that illustrate my problem. I made a kernell that takes a long time to complete, this time depending of the number of threads. If this time become too long, the program freezes for about 15 seconds, then my computer crashes, I got a blue screen reporting an error, and all I can do is to restart my computer.

On my computer, the below code crashes if I set a dimGrid.y value higher than about 60. With this value, the program takes about 15 seconds to complete, and sometimes crashes. However, I stay below the maximum number of blocks (65535 * 65535), I have a 8600GT…

// includes, system

#include <stdlib.h>

#include <stdio.h>

#include <cuda.h>

__global__ void kernell(long int* number)


	(*number) ++;


void main() 


		long int number = 0;

	long int* number_Device;

	cudaMalloc((void**)&number_Device, sizeof(long int));

	cudaMemcpy(number_Device, &number, sizeof(long int), cudaMemcpyHostToDevice);


	//Call of the Kernell

	dim3 dimGrid(65000,60);

	dim3 dimBlock(512);

	kernell<<<dimGrid , dimBlock>>>(number_Device);




Is there any solution to prevent this kind of crash?

Is it normal?

Thanks in advance


If you are using this card for display as well, then you’re hitting the watchdog timer. Search the forum for “watchdog”.

In short, it’s a OS level timer that will kill any kernel going on the GPU for over a couple of seconds for fear it’s in an endless loop or deadlocked. A safety mechanism. It’s impossible do disable it for an active video adapter and it’s in every OS.

So you can either split your kernel into smaller ones (that end within a couple of seconds) or use a dedicated card for CUDA that isn’t connected to a monitor.

The bluescreen happens because the driver fails to get up after the watchdog kills its kernel, it’s supposed to be fixed in a new version of the drivers (yet unreleased)

This causes a severe race condition and competitive access to ONE address of the memory. May be this is the reason why your kernell runs so long.



But even if I use an atomicAdd(number, 1) instead of the (*number) ++ line, the computer crashes.

Anybody knows how much seconds is the watchdog timer?



This will be even slower. Safer (no race) but slower, since now threads get serialized.

Depends. I don’t think it’s a hard limit, some people get away with 30 seconds, some much less. IIRC NVIDIA says 5 seconds is as long as you wanna go, after that you’ve crossed the redline and you’re like those guys in Fast and Furious, turbo’ing on their NOS - ready to explode at any given moment.

Thanks for the response.

I have only one Graphic Card + one graphic controller on my motherboard.

Do you know if it’s possible to use the motherboard controller for the display, and the nVidia Card only for CUDA? This way, i’ll get rid of the watchdog :lol:


If I’m not mistaken, it depends on the motherboard. There should be a BIOS option, something like Integrated Graphics = Enabled/Auto/Disabled, Auto being “disabled if a proper card is detected, enabled otherwise”. Some boards (or perhaps some BIOSes) don’t have the “Enabled” setting, in which case you’re out of luck.

The problem is your program however. It’s doing the worst thing possible - it either has a race condition or (if you enable atomics) runs all threads sequentially. You pretty much can’t get slower than that even if you tried. If you’re not aware of why this is so and don’t pay attention while writing the real code, the watchdog will be the least of your problems.

OK, I will try this :thumbup:

I know this code is awful. It was just a sample. ;-)