Running multiple CUDA apps on same GPU card Serious performance drop

Hello everyone, I have developed a big CUDA app, which is a hybrid app (kernels invoked by CPU and run by GPU).My dev machine is a simple WinXP OS 64bit PC with an Nvidia card cuda capability 1.1, while the machine where I carry out the actual testing and benchmark study is a Win7 OS 64bit PC with 3 GPUs (Nvidia Quadro with CUDA capability 1.1, and two Tesla C1060 with capability 1.3).I wanted to measure the performance deterioration as the number of apps running on the same GPU doubled.I obtained the following results, which are given in the table below:

OS ====== GPU’s CUDA capability ==== GPU time ratio (GPU time 2apps/GPU time 1app -same GPU used)
Win7==== 1.1 ==== 17.0
Win7 ==== 1.3==== 16.0
WinXP ==== 1.1==== 2.5

The only result that I can understand is that obtained on the WinXP platform. How come in the Win7OS I get such poor performance results?

Many thanks for your precious help! Please shed some light on this issue!!

I have noticed this degradation as well, and as far as I understand it has something to do with the Win7 WDDM driver model. there is worse degradation in CUDA programs that load many kernels and less degradation in programs that are memory intensive.

This issue can be partly solved by setting your GPU to TCC mode. [nvidia-smi -g (GPU ID) -dm (0 for WDDM, 1 for TCC)]

see more info here: http://us.download.nvidia.com/Windows/Quadro_Certified/263.06/263.06-Win7-WinVista-Tesla-Release-Notes.pdf

eldad.