intermittent killer kernel Kernel which causes CUDA to die, followed by launch failures

Sarnath · April 17, 2009, 4:59am

Cudesnick,

Please do post your updates. It will be very helpful!

Thanks!

Best Regards,

Sarnath

cellophane_man · April 17, 2009, 5:46pm

It was suggested to me to use cudaThreadExit to “clean up” from the ULF. Sadly this does not
seem to work. The cuda enviroment is still trashed and all subsequent kernel calls fail.
If anybody has any alternative methods to recover from an ULF caused
by “intermittent killer kernel” type failures that doesn’t require the calling program to be exited,
I would love to hear about it. It would be great to be able to at least recover in an orderly manner.
Best regards

cellophane man

Sarnath · April 21, 2009, 11:55am

Hi CellOphone man,

Are you too using “constant” memory ?? And, is ur ULF related to constant memory?

cellophane_man · April 21, 2009, 5:31pm

No, I am not using constant memory.

Please see the code fragments at the beginning of this thread.

cellophane man

cudesnick · April 23, 2009, 5:50am

So, here’s my update.

I still don’t know what the primary cause of the ULF is. As I mentioned earlier, sometimes the ULF changes to just some kind of an incorrect execution of the kernel. In that latter case the kernel code returns prematurely due to extensive error checking, that I have scattered all over the kernel. I know how to switch between the two (by adding/removing very innocent code, such as dumping some registers to device RAM), but I cannot interpret this change of behavior in any reasonable way. Let me reiterate, that I observe both the ULF and the premature return very rarely, they occur once in tens of thousands of kernel launches, each launch involving several thousand blocks.

So, when the crash is ULF, then the use of cudaThreadExit() didn’t help me to resolve my issue. This is in line with what cellophane man reported in this thread a couple of posts ago.

However, when I observe the premature return from the kernel, invocation of cudaThreadExit(), followed by full restart the same CUDA task (including startup initialization), with the exact same parameters, as those, which have caused the premature return, resolves my issue: the restarted task completes successfully. If I remember correctly (NOT sure), relaunching the kernel, without invocation of cudaThreadExit() and full reinitialization of the device memory didn’t help in this case.

sadhana · May 14, 2009, 5:56am

So, here’s my update.

I still don’t know what the primary cause of the ULF is. As I mentioned earlier, sometimes the ULF changes to just some kind of an incorrect execution of the kernel. In that latter case the kernel code returns prematurely due to extensive error checking, that I have scattered all over the kernel. I know how to switch between the two (by adding/removing very innocent code, such as dumping some registers to device RAM), but I cannot interpret this change of behavior in any reasonable way. Let me reiterate, that I observe both the ULF and the premature return very rarely, they occur once in tens of thousands of kernel launches, each launch involving several thousand blocks.

So, when the crash is ULF, then the use of cudaThreadExit() didn’t help me to resolve my issue. This is in line with what cellophane man reported in this thread a couple of posts ago.

However, when I observe the premature return from the kernel, invocation of cudaThreadExit(), followed by full restart the same CUDA task (including startup initialization), with the exact same parameters, as those, which have caused the premature return, resolves my issue: the restarted task completes successfully. If I remember correctly (NOT sure), relaunching the kernel, without invocation of cudaThreadExit() and full reinitialization of the device memory didn’t help in this case.

Hi,

One of my kernels takes more than 5 secs to complete and I get the cuda error “the launch timed out and was terminated”. Then the subsequent calls to cudaMalloc() fail even for small amount of memory request.

I tried to use cudaThreadExit() to clean up the device before proceeding further. But it doesn’t work and cudaMalloc fails.

The following is the sample kernel I reproduced the “launch timed out” problem and main function where I use cudaThreadExit():

__global__ void myKernel()

{

	float result = 2.0;

	float A = 565.55;

	while(1)

	{

	   result = result * A; 

	}

}

int main()

{

	dim3 grid, blk;

	cudaError_t err;

	grid.x = 1;

	blk.x = 32;

	

	myKernel<<<grid, blk>>>();

	err = cudaThreadSynchronize();

	if(err != cudaSuccess)

	{

		printf("cudaThreadSynchronize() failed\n");

	}

	//checkCUDAError("In Kernel:");

	

	cudaThreadExit();

	

	err = cudaMalloc((void **)&gpuInput,10 * sizeof(float));

	if(err != cudaSuccess)

	{

		printf("main 1:cudaMalloc failed\n");

	}

	if(gpuInput)

	{

		cudaFree(gpuInput);

	}

	cudaThreadExit();

	err = cudaMalloc((void **)&gpuInput,10 * sizeof(float));

	if(err != cudaSuccess)

	{

		printf("main 2:cudaMalloc failed\n");

	}

	if(gpuInput)

	{

		cudaFree(gpuInput);

	}

	cudaThreadExit();

	return 0;

}

Any suggestion how do I clean-up the device.

I use GeForce 8800 GTX and cuda 2.1.

Thanks!

Sadhana

sadhana · May 14, 2009, 11:22am

Just upgrading to cuda 2.2 solved my problem of subsequent call to cudaMalloc().
cudaThreadExit() is working fine, which is necessary to clean-up.

Thanks,
Sadhana

cellophane_man · May 18, 2009, 5:52pm

I tried out Cuda 2.2 toolkit and SDK with the new driver 185.85 and found that cudaThreadExit still does not permit recovery
from the subsequent ULFs from my example given at the beginning of this thread. There is still something that gets trashed that cudaThreadExit does not clean up. Anybody have any other ideas?

cellophane man

sadhana · May 19, 2009, 5:37am

My observation in cuda 2.2 says, if we don’t free the previously allocated memory(allocated before the timed out kernel) before calling “cudaThreadExit” then cudaThreadExit() doesn’t clean up and subsequent cudaMalloc()s fail no matter whatever the amount of memory request.

In the following modified sample(Please look at the comment part), If I comment out the part I have mentioned, cudaThreadExit() doesn’t seem to work fine as the subsequent cudaMalloc()s fail.

If I don’t comment out, cudaThreadExit() works fine and subsequent cudaMalloc()s succeed.

If anybody has any idea why is it so, please share.

int main()

{

	dim3 grid, blk;

	cudaError_t err;

	grid.x = 1;

	blk.x = 32;

	

	input = NULL;

	err = cudaMalloc((void **)&input,10 * sizeof(float));

	if(input == NULL || err != cudaSuccess)

	{

		printf("first cudaMalloc failed\n");

	}

	else

	{

		printf("First cudaMalloc succeeded\n");

	}

	myKernel<<<grid, blk>>>();

	err = cudaThreadSynchronize();

	printf("cudaThreadSync err = %d\n", err);

	if(err != cudaSuccess)

	{

		printf("cudaThreadSynchronize() failed\n");

	}

	checkCUDAError("In Kernel:");

				if(input)				/* If I comment out this "cudaFree", the subsequent calls to "cudaMalloc" after "cudaThreadExit" fail */

	{

		cudaFree(input);

	}	

	cudaThreadExit();

	

	err = cudaMalloc((void **)&gpuInput,10 * sizeof(float));

	printf("cudaMalloc err = %d\n", err);

	if(err != cudaSuccess)

	{

		printf("2nd cudaMalloc failed\n");

	}

	else

	{

		printf("2nd cudaMalloc succeeded\n");

	}

	if(gpuInput)

	{

		cudaFree(gpuInput);

	}

	

	err = cudaMalloc((void **)&input,100000 * sizeof(float));

	if(err != cudaSuccess)

	{

		printf("3rd cudaMalloc failed\n");

	}

	else

	{

		printf("3rd cudaMalloc succeeded\n");

	}

	if(input)

	{

		cudaFree(input);

	}

	cudaThreadExit();

	return 0;

}

Thanks,

Sadhana

sadhana · May 21, 2009, 6:20am

Just adding information to my above post…

whether I do “cudaFree” before or after calling kernel, it doesn’t matter. But I have to call “cudaFree” before calling “cudaThreadExit” to make it work.
Please look at my sample code in the above post of this thread.

Thanks
Sadhana

Xvv · June 11, 2009, 4:39pm

Are there any update / advice / workaround on that issue?

It seems I am having similar problem with a Tesla C1060 / CUDA 2.2 / 185.85 / XP64[indent]
MisterAnderson42’s timeout_test3 exits only 7k to 15k iterations.
Cellophane man’s test: ULF after 2k to 10k iterations (with the default “bad” 448 xSize)
[/indent]
I am a little worried because:[indent] 1. MisterAnderson42 seems to indicate this was only occuring on older hardware (9800GX2) … and working fine on S1070
(is it? or did I mis-read?)
2. I only recently ran into this ULF problem on this particular C1060 (via a failure from one of my kernels that I sometime use for testing) … and I can’t recall it failing before (been using it for almost a year).
[/indent]
Could this be somehow hardware related? (I first though the GPU was getting too hot, and began monitoring the temperature with GPUZ … but it seems ok (max at 74 Celsius)).

I have another C1060 (CUDA2.2 / 182.50 / XP64) and both tests run fine to the end (GPU did hit 80 degree C, though).

I’ll check with older (resp. newer) drivers when the machine(s) are accessible again … but I’d definitely appreciate any feedback / hint / diagnosis tool (?) in the meantime.

Thanks.

MisterAnderson42 · June 11, 2009, 7:46pm

You did not misread. I have been running rock-solid stable on Tesla S1070 and GTX 285. I don’t have a GTX 275 to test, but I suspect that it also does not have the problem.

Interesting that the problem shows up on one C1060 but not the other… Sorry, I don’t have any real suggestions for you. I have definitely seen temperature issues cause random ULFs, but you already checked that.

tmurray · June 11, 2009, 8:02pm

If he’s had it for almost a year, it’s a preproduction C1060 and therefore is not a reliable indicator of anything. (preproduction stuff is preproduction for a reason)

cloudguitar · June 11, 2009, 9:51pm

I had ULF trouble similar to that described by MisterAnderson42, my problem being described here

Is there any way to check if a card is pre-production (from the hardware or from a software-accessible property)?

tmurray · June 11, 2009, 9:53pm

Did you get it from somebody at NVIDIA before GT200 officially came out? If you bought it, it’s a production card.

cloudguitar · June 11, 2009, 11:05pm

Thanks for replying tmurray. I got them from a former lecturer that got them from NVIDIA… this is why I am not sure. I will try running my program on some Teslas that I am sure were bought and see if they show the same issues.

seibert · June 12, 2009, 1:08am

Ironically, my preproduction GT200 from last year finally failed today. :(

Topic		Replies	Views
code that crashes unpredictably CUDA Programming and Performance	15	12742	April 28, 2010
Random ULF for simple kernel call in loop CUDA Programming and Performance	6	6366	May 3, 2008
random kernel execution failure with unknown error CUDA programming on Linux CUDA Programming and Performance	9	8704	June 11, 2008
CUDA 2.0 seems to fail for long executions multiple process on one card fail CUDA Programming and Performance	5	7477	June 16, 2008
If you were a program and you would only run sometimes... your problem would be?? CUDA Programming and Performance	4	3954	August 2, 2009
Kernel runs perfectly when compiled for debugging, randomly crashes otherwise Debugging suggestions CUDA Programming and Performance	11	5244	August 20, 2009
Maximum Threads for Kernel Call CUDA Programming and Performance	38	16647	May 25, 2010
Debugging cuda kernels: printing and analysis after ULF How to extract data from failing kernels? CUDA Programming and Performance	12	6424	March 9, 2009
Stability Problem CUDA Programming and Performance	12	4059	February 4, 2011
Strange problem with kernel launch CUDA Programming and Performance	13	1952	October 21, 2010

intermittent killer kernel Kernel which causes CUDA to die, followed by launch failures

Related topics