Too much threads makes computer crashing If this kernell takes a long time to complete, I got a blue

ex23 · April 23, 2009, 3:18pm

Hello everybody,

Here is a sample kernell that illustrate my problem. I made a kernell that takes a long time to complete, this time depending of the number of threads. If this time become too long, the program freezes for about 15 seconds, then my computer crashes, I got a blue screen reporting an error, and all I can do is to restart my computer.

On my computer, the below code crashes if I set a dimGrid.y value higher than about 60. With this value, the program takes about 15 seconds to complete, and sometimes crashes. However, I stay below the maximum number of blocks (65535 * 65535), I have a 8600GT…

// includes, system

#include <stdlib.h>

#include <stdio.h>

#include <cuda.h>

__global__ void kernell(long int* number)

{

	(*number) ++;

}

void main() 

{

		long int number = 0;

	long int* number_Device;

	cudaMalloc((void**)&number_Device, sizeof(long int));

	cudaMemcpy(number_Device, &number, sizeof(long int), cudaMemcpyHostToDevice);

	

	//Call of the Kernell

	dim3 dimGrid(65000,60);

	dim3 dimBlock(512);

	kernell<<<dimGrid , dimBlock>>>(number_Device);

	cudaThreadSynchronize();

	system("pause");

}

Is there any solution to prevent this kind of crash?

Is it normal?

Thanks in advance

++

_Big_Mac · April 23, 2009, 5:21pm

If you are using this card for display as well, then you’re hitting the watchdog timer. Search the forum for “watchdog”.

In short, it’s a OS level timer that will kill any kernel going on the GPU for over a couple of seconds for fear it’s in an endless loop or deadlocked. A safety mechanism. It’s impossible do disable it for an active video adapter and it’s in every OS.

So you can either split your kernel into smaller ones (that end within a couple of seconds) or use a dedicated card for CUDA that isn’t connected to a monitor.

The bluescreen happens because the driver fails to get up after the watchdog kills its kernel, it’s supposed to be fixed in a new version of the drivers (yet unreleased)

navier-stokes · April 23, 2009, 5:51pm

This causes a severe race condition and competitive access to ONE address of the memory. May be this is the reason why your kernell runs so long.

Regards

Navier

ex23 · April 24, 2009, 6:46am

But even if I use an atomicAdd(number, 1) instead of the (*number) ++ line, the computer crashes.

Anybody knows how much seconds is the watchdog timer?

Thanks

++

_Big_Mac · April 24, 2009, 1:33pm

This will be even slower. Safer (no race) but slower, since now threads get serialized.

Depends. I don’t think it’s a hard limit, some people get away with 30 seconds, some much less. IIRC NVIDIA says 5 seconds is as long as you wanna go, after that you’ve crossed the redline and you’re like those guys in Fast and Furious, turbo’ing on their NOS - ready to explode at any given moment.

ex23 · April 24, 2009, 1:53pm

Thanks for the response.

I have only one Graphic Card + one graphic controller on my motherboard.

Do you know if it’s possible to use the motherboard controller for the display, and the nVidia Card only for CUDA? This way, i’ll get rid of the watchdog :lol:

++

_Big_Mac · April 24, 2009, 2:51pm

If I’m not mistaken, it depends on the motherboard. There should be a BIOS option, something like Integrated Graphics = Enabled/Auto/Disabled, Auto being “disabled if a proper card is detected, enabled otherwise”. Some boards (or perhaps some BIOSes) don’t have the “Enabled” setting, in which case you’re out of luck.

The problem is your program however. It’s doing the worst thing possible - it either has a race condition or (if you enable atomics) runs all threads sequentially. You pretty much can’t get slower than that even if you tried. If you’re not aware of why this is so and don’t pay attention while writing the real code, the watchdog will be the least of your problems.

ex23 · April 24, 2009, 3:15pm

OK, I will try this External Media

I know this code is awful. It was just a sample. ;-)

Topic		Replies	Views
Bluescreen while running CUDA kernel CUDA Programming and Performance	5	7703	July 8, 2009
Basic Math Functions Causing Driver to Crash CUDA Programming and Performance	8	5550	February 6, 2012
Watchdog Timer What exactly is the watchdog timer? CUDA Programming and Performance	4	15931	July 8, 2008
Block + Thread parameters causing blue screens on windows CUDA Programming and Performance	7	944	October 9, 2018
Crashes - display driver recovers Cuda program causes card to give up. CUDA Programming and Performance	4	3801	June 21, 2011
CUDA Timeout? CUDA Programming and Performance	7	27690	December 19, 2011
CUDA Kernel Crash CUDA Programming and Performance	13	4631	January 8, 2018
Kernel problem, execution stop after ~15min CUDA Programming and Performance	7	1786	November 4, 2016
5 seconds limitation is permanent ? CUDA Programming and Performance	9	13892	June 4, 2007
Why does this programm crash CUDA Programming and Performance	3	1867	March 12, 2009

Too much threads makes computer crashing If this kernell takes a long time to complete, I got a blue

Related topics