Bluescreen while running CUDA kernel

Hi,
I am trying to run a CUDA kernel with the following specifications:

dim3 dimGrid(8,4096,1);
dim3 dimBlock(512,1,1);

in Windows XP. Sorry, I can’t share the kernel details right now as this is a commercial work in progress. I see that Windows kernel crashes because of NVidia driver. I will post more details soon but here is some initial analysis.

CUDA Hardware info:

D:\NVIDIA\NVC\bin\win32\Release>deviceQuery.exe
CUDA Device Query (Runtime API) version (CUDART static linking)
There is 1 device supporting CUDA

Device 0: “GeForce 8500 GT”
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 268107776 bytes
Number of multiprocessors: 2
Number of cores: 16
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 0.92 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: No
Compute mode: Default (multiple host threads
can use this device simultaneously)

Crash dump analysis (preliminary)

0: kd> !analyze -v

Bugcheck Analysis

THREAD_STUCK_IN_DEVICE_DRIVER_M (100000ea)
The device driver is spinning in an infinite loop, most likely waiting for
hardware to become idle. This usually indicates problem with the hardware
itself or with the device driver programming the hardware incorrectly.

Additional info

ADDITIONAL_DEBUG_TEXT:

FAULTING_MODULE: 804d7000 nt

DEBUG_FLR_IMAGE_TIMESTAMP: 49fa9709

FAULTING_THREAD: 86f31d10

DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_FAULT

CUSTOMER_CRASH_COUNT: 2

BUGCHECK_STR: 0xEA

LAST_CONTROL_TRANSFER: from f7898a67 to 80545946

STACK_TEXT:
WARNING: Stack unwind information not available. Following frames may be wrong.
b6e39b10 f7898a67 f798fb9c 00000000 00000000 nt+0x6e946
b6e39e04 804ff853 f798fb48 b6e39e50 b6e39e44 watchdog+0xa67
b6e39e54 806e6ef2 00000000 00000000 b6e39e6c nt+0x28853
b6e39e6c f5d517c6 badb0d00 b6e39f14 ffffffff hal+0x2ef2
00000000 00000000 00000000 00000000 00000000 nv4_mini+0x327c6

STACK_COMMAND: .thread 0xffffffff86f31d10 ; kb

FOLLOWUP_IP:
nv4_mini+327c6
f5d517c6 ?? ???

SYMBOL_STACK_INDEX: 4

SYMBOL_NAME: nv4_mini+327c6

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nv4_mini

IMAGE_NAME: nv4_mini.sys

BUCKET_ID: WRONG_SYMBOLS

Followup: MachineOwner

0: kd> .thread
Implicit thread is now 875c35b8
0: kd> kb
ChildEBP RetAddr Args to Child
WARNING: Stack unwind information not available. Following frames may be wrong.
b6e39b10 f7898a67 f798fb9c 00000000 00000000 nt+0x6e946
b6e39e04 804ff853 f798fb48 b6e39e50 b6e39e44 watchdog+0xa67
b6e39e54 806e6ef2 00000000 00000000 b6e39e6c nt+0x28853
b6e39e6c f5d517c6 badb0d00 b6e39f14 ffffffff hal+0x2ef2
00000000 00000000 00000000 00000000 00000000 nv4_mini+0x327c6

Nvidia driver folks, if you need any further information please drop me an email. Meanwhile, I will try to see how to give you more information if necessary.

Thanks,
-Romit

Does the kernel run longer than a couple of seconds and do you use this GPU as a display adapter?

Yes that is true. It does run for some time and the GPU is my display adapter!

You’re probably running into the watchdog then. Search the forum for “watchdog”.

I searched through the forum and also got few from Google. But at the end of the day I am not certain if I can say that “if we run for more than 5s watchdog will be hit”. I thought the Nvidia hardware is internally changing modes between CUDA and graphcis fast enough. Least, I should not see BSOD.

There’s little NVIDIA can do about it, it’s an OS safety measure.

As for changing modes, kernels aren’t interruptable (timesliced). Whole kernels can be interleaved (with other kernels, shaders or render requests) and this is done automatically but an individual piece in the queue cannot be subdivided and must complete (or trigger a watchdog).