Other than the WDDM timeout, what are the CUDA W7 issues for GTX line?

One can fix the timeout issue in the registry, and other than that I have not yet noticed any other problems using the WDDM driver vs the TCC driver.

As far as performance, it has actually been better for the GTX 780ti in W7 than in Ubuntu (CUDA 5.5), so I am trying figure out what other issues may arise.

Txbob has mentioned that Directx can evict CUDA kernels (which would not happen with the TCC driver, since it hides the GPU from the OS), but what other issues may arise from using the WDDM driver in Windows?

Full disclosure: I don’t work with consumer GPUs.

Watchdog timers are actually used on all platforms (Windows, Linux, OS X) if a GPU is used for both compute and driving a display (e.g. for GUI desktop). This is not something that is specific to WDDM.

From what I know, there used to be limits on the size of individual GPU memory allocations with the WDDM driver, driven by the fact that WDDM uses system memory as backing store for GPU memory. I do not know the exact status of those limitations. I am vaguely aware that they have been eased in recent drivers.

The WDDM driver model has a lot of inherent overhead to launch work on the GPU. This overhead could become visible as a performance issue if applications issue a large stream of small (short duration) kernels. The CUDA driver tries to mitigate that by batching kernel launches, thus amortizing the WDDM roundtrip overhead across several launches. However, this can in turn introduce performance artifacts that may require manual flushing of the queue at appropriate times. I do not recall the recommended technique for flushing the queue, but it has been discussed in these forums and should be documented.

If I understand correctly, you are seeing better application performance under Win7 compared to Linux. I assume this is a controlled experiment, i.e. this is the exact same system with the same generation drivers and CUDA software stack? There could be a variety of possible reasons, hard to say anything about it without knowing the details of the application. Are both platforms 64-bit platforms? If the difference is significant and detrimental to your use case, you might want to consider filing a bug so the driver team can investigate.

Thanks, that was very informative.

The only difference between the linux setup and the Win 7 setup, is that the linux setup is using CUDA 5.0 , while the Win 7 is using CUDA 5.5.

The difference is not huge (about 10% in favor of Win 7 for the same operations), but I was just surprised that Win 7 was even competitive with linux.

I do notice much more variation in the performance of the GTX line when compared the Tesla line. The K20 and K40 are extremely consistent in their running times, while the GTX 780ti can vary as much as 15%, probably due to the boost clock feature (it runs better after a few ‘warm-ups’).

If CUDA 5.0 is used on one machine, and CUDA 5.5 on the other, presumably not just the drivers are different but the toolchain as well. That could also play into the performance difference. To isolate the impact of OS driver models, one would have to equalize the CUDA software stack across machines.

Looking at the published specifications of the GTX 680 (base: 1006, boost: 1058), the GTX 780 (base: 863, boost: 900) and the GTX 780 Ti the boost clock differences seem to be only in the 4%-6% range, not 15%. So there must be other factors in the observed performance fluctuations.

This particular GTX 780ti is an EVGAs super-clocked ACX version, so maybe that is a factor:

http://www.guru3d.com/articles_pages/evga_geforce_gtx_780_ti_sc_acx_superclock_review,1.html

This dual-boot machine is a project I built myself from scratch, and will be dual-purpose (PC gaming and recreational development). At work I use the Tesla line, but it just interesting to run some of the same code on the consumer card.

The GTX 780 Ti as specified by NVIDIA has base=875, boost=928 for a 6.0% maximum boost. The specs for your particular product state base=1006, boost=1072 for a 6.6% maximum boost.

Greg@NV suggests in this link:

https://devtalk.nvidia.com/default/topic/548639/is-wddm-causing-this-/

to use:

cudaEventQuery(0);

to flush the WDDM queue. YMMV