Here is a sample kernell that illustrate my problem. I made a kernell that takes a long time to complete, this time depending of the number of threads. If this time become too long, the program freezes for about 15 seconds, then my computer crashes, I got a blue screen reporting an error, and all I can do is to restart my computer.
On my computer, the below code crashes if I set a dimGrid.y value higher than about 60. With this value, the program takes about 15 seconds to complete, and sometimes crashes. However, I stay below the maximum number of blocks (65535 * 65535), I have a 8600GT…
// includes, system
__global__ void kernell(long int* number)
long int number = 0;
long int* number_Device;
cudaMalloc((void**)&number_Device, sizeof(long int));
cudaMemcpy(number_Device, &number, sizeof(long int), cudaMemcpyHostToDevice);
//Call of the Kernell
kernell<<<dimGrid , dimBlock>>>(number_Device);
Is there any solution to prevent this kind of crash?
If you are using this card for display as well, then you’re hitting the watchdog timer. Search the forum for “watchdog”.
In short, it’s a OS level timer that will kill any kernel going on the GPU for over a couple of seconds for fear it’s in an endless loop or deadlocked. A safety mechanism. It’s impossible do disable it for an active video adapter and it’s in every OS.
So you can either split your kernel into smaller ones (that end within a couple of seconds) or use a dedicated card for CUDA that isn’t connected to a monitor.
The bluescreen happens because the driver fails to get up after the watchdog kills its kernel, it’s supposed to be fixed in a new version of the drivers (yet unreleased)
This will be even slower. Safer (no race) but slower, since now threads get serialized.
Depends. I don’t think it’s a hard limit, some people get away with 30 seconds, some much less. IIRC NVIDIA says 5 seconds is as long as you wanna go, after that you’ve crossed the redline and you’re like those guys in Fast and Furious, turbo’ing on their NOS - ready to explode at any given moment.
If I’m not mistaken, it depends on the motherboard. There should be a BIOS option, something like Integrated Graphics = Enabled/Auto/Disabled, Auto being “disabled if a proper card is detected, enabled otherwise”. Some boards (or perhaps some BIOSes) don’t have the “Enabled” setting, in which case you’re out of luck.
The problem is your program however. It’s doing the worst thing possible - it either has a race condition or (if you enable atomics) runs all threads sequentially. You pretty much can’t get slower than that even if you tried. If you’re not aware of why this is so and don’t pay attention while writing the real code, the watchdog will be the least of your problems.