unspecified launch failure in prior launch

sideshow_bob · September 14, 2007, 10:58am

Hi,

I wrote a kernel and I’m currently testing it for performance. Therefore I put the kernel invocation into a loop (1000 times) and measure the time.
After every kernel launch I check for errors.
Occasionally I get an error message:
unspecified launch failure in prior launch

After that the gpu seems to run at 40 % of it’s normal speed. All kernels where I check performance are at 0.4 of their normal execution rate.

To get back to normal performance I do tried to restart X (along with the nvidia driver) which has no effect. Only a reboot helped.

I monitored the core temperature with the nvidia-settings tool and it never reaches the slowdown threshold of 115 C. Maximum core temperature was beneath 70 C.

Hardware:
GeForce 8600 GT

Driver:
100.14.11

Only certain kernels show this behaviour occasionally.

Any help is appreciated

MisterAnderson42 · September 14, 2007, 1:35pm

I have a very similar issue I’ve been tracking for months. It is very frustrating, and I have yet to be able to reproduce it in a small test case. In my situation, the kernel only executes for 10ms, but once every 10,000 calls on average (randomly, of course) it gives the “timeout” error for the 5s device watchdog timeout. If I run it on the console with no X, it eventually gets to a kernel call that just keeps running and running as if it is in an infinite loop.

I don’t notice any decrease in performance after the error occurs, though. Just that the error seems more likely to occur again. If I run ANY openGL program after the error occurs, it usually hard locks the system so that a physical power down is required to reboot. Sometimes, hitting Ctrl-C really fast will save it.

Try running glxgears after the error to see if you get the same behavior.

Is one of your test cases that reproduces the problem small enough to post here?

sideshow_bob · September 15, 2007, 6:09pm

I’m not sure anymore if this is a problem of the compiler or a hardware issue.
I ample tested the kernel on a GeForce 8800 and was not able to reproduce it.
Maybe the problem is related to the minor revision (the GeForce 8800 is revision 1.0 and the GeForce 8600 is revision 1.1) or my GeForce 8600 has a defect?

Basically I put a custom structure into uint4 (using some bit shifts) to make use of texture fetches. My “stable” version of the kernel uses the aligned structure directly from device memory and is 10% slower. This “stable” version has never caused this problem on my GeForce 8600 and I use it for some weeks now.

Sadly I haven’t found a way to use texture fetches directly on custom structures - the compiler lament about no conversion.

However, since I was not able to reproduce it on the GeForce 8800 and it is no problem to reproduce it on the GeForce 8600 I wonder if anybody has an idea on how to track the problem down…

MisterAnderson42 · September 18, 2007, 2:29pm

I have finally been able to produce a simple test case that reproduces this problem, now both in linux and windows. What’s funny is that in my real app, I get the “timeout” error, but in the small test case I just get “unspecified launch failure”.

It has now been submitted as an official bug into the NVIDIA system. Hopefully it can be solved (or has already been solved) for the next release.

quirin · September 24, 2007, 9:12am

I have the same problem here. It is really frustrating. It took my like one week to do all the engineering (sorting, merging, streaming - alltogether I have like ten kernels) and, as you mentioned, occassionally it is not working. This is totally unacceptable since I have to do that computation every single frame.

The behavior is that after some kernel call or even at startup somewhere between the cudaMallocs (after having lunched the app a couple of times in row) the app interlocks and seems to clear all my data (after the aforementioned five seconds)

Would you mind posting the simple test case so that I can find similarities to my problem (I am writing my master thesis right now and I want to reasons why I am not able to integrate it in my application…:-/)

Topic		Replies	Views
Kernel crashed due to "unspecified launch failure" on CUDA 9 but not CUDA 8 CUDA Programming and Performance	3	797	October 31, 2017
"unspecified launch failure" runtime failure CUDA Programming and Performance	6	3414	May 9, 2009
unspecified launch failure kernel fails sometimes not everytime CUDA Programming and Performance	6	6150	February 2, 2010
"Unspecified Launch Failure" Error CUDA and GLSL issues CUDA Programming and Performance	0	2946	February 27, 2008
CUDA failures CUDA Programming and Performance	3	6546	November 5, 2010
unspecified launch failure ~2 second durating random freezes CUDA Programming and Performance	2	2013	March 4, 2008
Unspecified launch failure Sometimes it works, sometimes not. CUDA Programming and Performance	0	5907	September 25, 2007
Unspecified launch failure CUDA Programming and Performance	2	5746	May 24, 2009
Different cuda kernels reports 'unspecified launch failure' crashes at the same time CUDA Programming and Performance	3	475	October 5, 2023
Kernel runs perfectly when compiled for debugging, randomly crashes otherwise Debugging suggestions CUDA Programming and Performance	11	5301	August 20, 2009

unspecified launch failure in prior launch

Related topics