Tesla Compute Cluster driver

Uncle_Joe · August 12, 2010, 6:02pm

I have some questions about the Tesla compute cluster driver:

“Reducing kernel launch overhead” - how much does this help? Does the overhead have anything to do with the ~10us I found here?
The notes say you have to use a non-NVIDIA display driver if you want a display, but why? I know Windows Display Driver Model 1.0 (pre Windows 7) only supports 1 driver, but WDDM 1.1 supports > 1 driver. This would be a major inconvenience, because currently I use a Quadro 290 for the display.

tmurray · August 15, 2010, 6:32pm

If you don’t have strict latency requirements, you might not notice a huge change (in large part because we batch when possible on WDDM, which amortizes a lot of the cost of submitting GPU work). However, for iterative algorithms with relatively short kernel invocations, this can make a major performance difference.
Basically there are different components for TCC and WDDM drivers, and Windows gets unhappy when you have two drivers with the same component names. I improved this significantly in CUDA 3.2, so the whole convenience thing is fixed.

tmurray · August 15, 2010, 6:32pm

If you don’t have strict latency requirements, you might not notice a huge change (in large part because we batch when possible on WDDM, which amortizes a lot of the cost of submitting GPU work). However, for iterative algorithms with relatively short kernel invocations, this can make a major performance difference.
Basically there are different components for TCC and WDDM drivers, and Windows gets unhappy when you have two drivers with the same component names. I improved this significantly in CUDA 3.2, so the whole convenience thing is fixed.

Uncle_Joe · August 16, 2010, 4:12pm

Good, that’s what I need for my median / SelectNth code with lots of global synchronization (kernel launches)

Uncle_Joe · August 16, 2010, 4:12pm

Good, that’s what I need for my median / SelectNth code with lots of global synchronization (kernel launches)

tmurray · August 16, 2010, 6:22pm

Yeah if you’re doing a loop of kernel → memcpy → check to see if a convergence condition is met → repeat, TCC is going to kill WDDM in terms of performance here.

tmurray · August 16, 2010, 6:22pm

Yeah if you’re doing a loop of kernel → memcpy → check to see if a convergence condition is met → repeat, TCC is going to kill WDDM in terms of performance here.

Topic		Replies	Views
Performance difference between Tesla and system where Cuda GPU is used as display device CUDA Programming and Performance	8	5904	September 2, 2009
Comparison Linux vs windows of "cudaDeviceSynchronize" CUDA Programming and Performance	7	2387	August 13, 2013
Is there anyone know about the performance at linux and windows? CUDA Programming and Performance	4	982	November 2, 2012
CUDA slower in Windows 7 than in Windows XP same computer, two OSs, different run times CUDA Programming and Performance	21	18959	November 11, 2009
CUDA on Windows much slower than on linux CUDA Programming and Performance	5	3461	January 26, 2013
Other than the WDDM timeout, what are the CUDA W7 issues for GTX line? CUDA Programming and Performance	6	1449	June 26, 2014
Tesla Compute Cluster driver released non-display driver for 64-bit Windows Server 08/08 R2 CUDA Programming and Performance	37	30422	October 21, 2014
Will Microsoft Windows MCDM improve the WDDM vs TCC situation? CUDA Programming and Performance	2	105	October 23, 2024
implications of using Tesla C2050's DVI output for display? CUDA Programming and Performance	2	6429	April 10, 2011
Which GPU for best performance with TCC and CUDA cores (no tensors) CUDA Programming and Performance	30	143	December 6, 2024

Tesla Compute Cluster driver

Related topics