Big OptiX kernel launches on Windows (WDDM driver model)

wenzel.jakob · November 30, 2022, 10:50am

Dear OptiX team,

I have a naïve question about something I do not understand very well. Some time ago, I read that on Windows, the graphics card is somewhat restricted by the OS graphics stack (WDDM driver model) unless one has a pro-level card (Tesla, Quadro) switched into a special headless “TCC” mode.

The problem with WDDM was that a long-running kernel would freeze the OS GUI, eventually causing a watchdog timer to expire and restart the graphics driver (thereby crashing the long-running kernel). I can’t remember the article, but I e.g. saw it mentioned here: "Display driver stopped responding and has recovered" WDDM Timeout Detection and Recovery

The context of my question is that I am doing big launches (2^30 = 1 billion threads) on a Windows machine without seeing any such issues. I can click on the start menu and launch programs while the kernel is running – no freezing, crashes, or other unexpected behavior. Is the advice about WDDM drivers outdated? Does OptiX have some internal countermeasures to prevent this failure mode?

The latest version of Mitsuba had added a change to keep OptiX launches small on Windows (~2 million threads per launch) based on this old advice, but now I am wondering whether I should revert the change to keep things simple.

Thanks,
Wenzel

droettger · November 30, 2022, 2:12pm

Hi Wenzel,

it’s a matter of OS and GPU support for preemption methods. The newer the OS and GPU, the more is this supported.

This presentation explains it nicely:
https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9957-using-cuda-on-windows.pdf

The recommendation to check for the support of compute preemption in that slide deck means the device attributes cudaDevAttrComputePreemptionSupported resp. CU_DEVICE_ATTRIBUTE_COMPUTE_PREEMPTION_SUPPORTED.

wenzel.jakob · November 30, 2022, 4:46pm

Excellent, this clears up many questions. Thanks @droettger!

There is one comment in the PDF that I don’t understand:

Just because you can doesn’t mean you should run kernels for an extended period

Preemption on WDDM comes with some internal scheduling policies that makes it hard to purposely take advantage of compute preemption. The easiest way is to simply design your application without worrying about TDR.

The headline and body of this item seem in conflict to me. The header suggests it’s STILL a bad idea to run long kernels even if preemption is supported. The bottom says not to worry about it. Am I missing something?

droettger · November 30, 2022, 5:13pm

I interpret that as recommendation to run shorter kernels which do not need to do preemption in general, since that’s most likely adding OS overhead.
It just got easier to design applications without TDR in mind when running on sufficiently new OS and GPU configurations (Windows 10 RS4 or newer in WDDM2 mode and Pascal or newer GPU architectures). You cannot control if and when preemption is done by the OS though.
If you need to support older architectures (e.g. Maxwell), you’d still need to design the workload to stay below the 2 seconds timeout.

wenzel.jakob · November 30, 2022, 8:32pm

Thanks, that makes sense!

system · December 14, 2022, 8:33pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Timeout detection and recovery CUDA Programming and Performance	5	1664	January 31, 2020
WDDM on windows 7 and kernel call overhead CUDA Programming and Performance	1	1324	May 20, 2010
Other than the WDDM timeout, what are the CUDA W7 issues for GTX line? CUDA Programming and Performance	6	1475	June 26, 2014
Tesla Compute Cluster driver CUDA Programming and Performance	6	1925	August 16, 2010
Computing with Geforce CUDA cards CUDA Programming and Performance	18	5032	March 3, 2014
Simple CUDA program hitting size limits/errors on Windows but not Linux CUDA Programming and Performance	23	2021	January 12, 2019
Which GPU for best performance with TCC and CUDA cores (no tensors) CUDA Programming and Performance	30	546	December 6, 2024
"Display driver stopped responding and has recovered" WDDM Timeout Detection and Recovery CUDA Programming and Performance	19	160453	February 4, 2012
CUDA slower in Windows 7 than in Windows XP same computer, two OSs, different run times CUDA Programming and Performance	21	19022	November 11, 2009
CUDA debugger does WDDM timeout at breakpoint CUDA Programming and Performance	6	1314	June 2, 2015

Big OptiX kernel launches on Windows (WDDM driver model)

Related topics