Degraded CUDA programs performance in Windows WDDM mode after driver 500

I have a scientific data processing CUDA application running in Windows 10. When possible, I set the GPU in TCC driver mode, but in many customers system it stays at WDDM driver mode.

I have lately noticed a decrease of performance of about 20% compared with what I used to have, when I set the GPU in WDDM driver mode. After researching the issue, I am almost certain that it comes from the driver.

Using a TITAN RTX as my test GPU, I have run the application in the same Windows 10 system (fully updated) testing different NVIDIA drivers from the latest, 527.56, all the way back to 471.11.
What I have found is that up to driver 472.84 the performance is the same whether the GPU is in TCC or WDDM driver mode, and equal to what I am expecting. Starting with the next driver version I have available, 511.09, the performance in TCC mode stays at what it was, but in WDDM mode the performance drops about 20%.

Coincidental or not, driver 472.84 is the last “standard” driver published in the NVIDIA driver download page, and 511.09 is a DCH driver.

Is there something in the DCH drivers that affect the performance in WDDM driver mode?
If so, how can I fix this performance decrease, preferably programmatically?

Thank you for your help

I don’t know what a DCH driver is and how is differs from “standard” drivers. Is this an apples-to-apples comparison? Are both production WHQL drivers?

Not knowing anything about the application that is experiencing this slowdown and its performance characteristics (how exactly is it exercising the driver?), it seems impossible to even speculate.

Under the assumption that the driver version experiment is an apples-to-apples comparison: Consider preparing a minimal reproducer program and filing a bug with NVIDIA for a performance regression.

Yes, this is an apples-to-apples comparison. Other than the driver version and setting the driver mode TCC or WDDM, nothing else is changed to make the comparisons.

Both the standard and the DCH drivers are WHQL drivers. This is the difference between them according to NVIDIA download page:

“DCH” refers to drivers developed according to Microsoft’s DCH driver design principles.
DCH drivers are built with requisite Declarative, Componentized, Hardware Support App elements. DCH drivers are installed on most new desktop and mobile workstation systems.

“Standard” refers to driver packages that predate the DCH driver design paradigm. Standard drivers are for those who have not yet transitioned to contemporary DCH drivers, or require these drivers to support older products.

I’ll investigate further in my application to try to pinpoint where the performance penalty occurs.

It appears the transition to DCH drivers at NVIDIA is complete after Microsoft has been prodding vendors for two years. All recent NVIDIA drivers for Windows10 appear to be DCH (so says the NVIDIA control panel), including the ones I have installed on my systems.

In my understanding DCH is only about packaging, not programmatic driver interfaces, so any connection with a performance regression is likely just a coincidence (e.g. recent DCH drivers are cut from a branch that also inludes changes that cause the loss in performance observed).

Filing a bug (that includes a reproducer) seems the way to go.