Kernel start timeout with n-body SDK demo (2 Tesla C1060s, Windows 7 64-bit)

Unified_Research · October 6, 2009, 9:17pm

Hello everyone,

I’d love any fixes, help, pointers, tips, or tricks anyone has that would resolve the following key issue:

Running the suggested n-body simulation mentioned on http://www.nvidia.com/object/tesla_build_your_own.html (with the exact arguments specified on that page) on either of my Tesla C1060 cards causes the driver to crash and restart.

The specific error I’m getting from the nbody.exe program itself is:

cudaSafeCall() Runtime API error in file <./nbody.cpp>, line 291 : the launch timed out and was terminated.

As a quick sanity check, I tried running the same demo program with the number of bodies argument modified from --n=131072 to --n=13107 (lopping off a power of ten or so), and that completes within around 1100 ms. I then tried modifying it again to --n=31072 (lopping off a power of ten from the original, but almost 3x the previous reduced sanity check) also timed out.

I’m hosting these two cards in an Intel i7 920 machine running Windows 7 Ultimate 64-bit, using the most current (at least, as of September 27, 2009) 64-bit NVIDIA WHQL CUDA drivers. My display card is a PNY GeForce 8400 (512 MB) installed in a PCI slot. I am not running the n-body demo on the PNY PCI card.

I’ve done some digging around here and there and the issue seems to be that the CUDA kernel is not starting up on the Teslas in time (i.e. within five seconds, or so I’ve read), so the Windows watchdog timer expires and the watchdog kills and restarts the driver. I’ve seen some information pointing to a registry key change that might work, but is ill-advised; I don’t know if that information is current for Windows 7. Additionally, I read something suggesting that cards that are not driving any displays should not be subject to the watchdog timer, but, conversely, since the system is using the same driver for all three cards, it would seem that the driver will get killed and restarted by Windows even if the kernels are only being set to run on the Tesla cards.

My apologies in advance for the lack of more detail in this initial post. I’ll post logs and excerpts from tests I’ve run, as well as more complete system information later on, if needed or helpful.

Thanks in advance.

tmurray · October 6, 2009, 9:26pm

no, the problem is that the kernel doesn’t complete within the TDR window, TDR triggers, and the driver is reset (killing the app).

set TdrLevel to 0 as described in this article: [url=“http://www.microsoft.com/whdc/device/display/wddm_timeout.mspx”]http://www.microsoft.com/whdc/device/displ...dm_timeout.mspx[/url]

Unified_Research · October 6, 2009, 9:43pm

Thanks for the prompt reply! I’ll implement this in a few hours when I get back to the workstation and report back.

Is this already previously documented somewhere that I missed?

Thanks again!

Topic		Replies	Views
N-Body Benchmark Crash CUDA Programming and Performance	3	3337	March 22, 2009
CUDA Display driver stopped working on Windows 7 32/64 Display driver stopped working CUDA Programming and Performance	13	192715	February 19, 2010
question about "launch timed out" CUDA Programming and Performance	2	1446	April 24, 2009
CUDA kernel timeout CUDA Programming and Performance	12	59124	December 22, 2022
Simple CUDA program hitting size limits/errors on Windows but not Linux CUDA Programming and Performance	23	2219	January 12, 2019
"Display driver stopped responding and has recovered" WDDM Timeout Detection and Recovery CUDA Programming and Performance	19	160659	February 4, 2012
Watchdog issue in XP professional with Tesla c1060,CUDA 2.0 and driver 181.22 CUDA Programming and Performance	0	1423	April 20, 2009
Cuda timeout and crash CUDA Programming and Performance	1	946	July 17, 2009
Timeout detection and recovery CUDA Programming and Performance	5	1797	January 31, 2020
Bluescreen death when running nbody Crashes with nbody CUDA Programming and Performance	1	842	October 8, 2010

Kernel start timeout with n-body SDK demo (2 Tesla C1060s, Windows 7 64-bit)

Related topics