unspecified launch failure in prior launch

Hi,

I wrote a kernel and I’m currently testing it for performance. Therefore I put the kernel invocation into a loop (1000 times) and measure the time.
After every kernel launch I check for errors.
Occasionally I get an error message:
unspecified launch failure in prior launch

After that the gpu seems to run at 40 % of it’s normal speed. All kernels where I check performance are at 0.4 of their normal execution rate.

To get back to normal performance I do tried to restart X (along with the nvidia driver) which has no effect. Only a reboot helped.

I monitored the core temperature with the nvidia-settings tool and it never reaches the slowdown threshold of 115 C. Maximum core temperature was beneath 70 C.

Hardware:
GeForce 8600 GT

Driver:
100.14.11

Only certain kernels show this behaviour occasionally.

Any help is appreciated

I have a very similar issue I’ve been tracking for months. It is very frustrating, and I have yet to be able to reproduce it in a small test case. In my situation, the kernel only executes for 10ms, but once every 10,000 calls on average (randomly, of course) it gives the “timeout” error for the 5s device watchdog timeout. If I run it on the console with no X, it eventually gets to a kernel call that just keeps running and running as if it is in an infinite loop.

I don’t notice any decrease in performance after the error occurs, though. Just that the error seems more likely to occur again. If I run ANY openGL program after the error occurs, it usually hard locks the system so that a physical power down is required to reboot. Sometimes, hitting Ctrl-C really fast will save it.

Try running glxgears after the error to see if you get the same behavior.

Is one of your test cases that reproduces the problem small enough to post here?

I’m not sure anymore if this is a problem of the compiler or a hardware issue.
I ample tested the kernel on a GeForce 8800 and was not able to reproduce it.
Maybe the problem is related to the minor revision (the GeForce 8800 is revision 1.0 and the GeForce 8600 is revision 1.1) or my GeForce 8600 has a defect?

Basically I put a custom structure into uint4 (using some bit shifts) to make use of texture fetches. My “stable” version of the kernel uses the aligned structure directly from device memory and is 10% slower. This “stable” version has never caused this problem on my GeForce 8600 and I use it for some weeks now.

Sadly I haven’t found a way to use texture fetches directly on custom structures - the compiler lament about no conversion.

However, since I was not able to reproduce it on the GeForce 8800 and it is no problem to reproduce it on the GeForce 8600 I wonder if anybody has an idea on how to track the problem down…

I have finally been able to produce a simple test case that reproduces this problem, now both in linux and windows. What’s funny is that in my real app, I get the “timeout” error, but in the small test case I just get “unspecified launch failure”.

It has now been submitted as an official bug into the NVIDIA system. Hopefully it can be solved (or has already been solved) for the next release.

I have the same problem here. It is really frustrating. It took my like one week to do all the engineering (sorting, merging, streaming - alltogether I have like ten kernels) and, as you mentioned, occassionally it is not working. This is totally unacceptable since I have to do that computation every single frame.

The behavior is that after some kernel call or even at startup somewhere between the cudaMallocs (after having lunched the app a couple of times in row) the app interlocks and seems to clear all my data (after the aforementioned five seconds)

Would you mind posting the simple test case so that I can find similarities to my problem (I am writing my master thesis right now and I want to reasons why I am not able to integrate it in my application…:-/)