Performance degradation with multiple gpu cards

Hello,

The measured performance of a single kernel running on a single GPU drops if the system has more than one GPU board. Googling indicates the cause might be automatic reduction in clock rate when a system has multiple boards. How to avoid this? OS is linus; our system has enough power and cooling to support multiple GPU cards – we want each and every card to run at full clock speed.

Thanks,

Could this be the reason same kernel function runs faster on my gt200 gpu than my fermi one? They both are on the same motherboard. I am trying to measure the performance of each gpu.

If what you say is true, then I shall take out one gpu to calculate the performance.

Is your code PCI bounded? if so two GPU cards sharing the same PCI bus would reduce your performance by half.

eyal

No; performance of just one kernel running on just one GPU – only one GPU is active during measurement. Performance measurement does not include data movement with host; performance being measured is the time taken by kernel only. Performance drops when system has two GPU cards. Only one user on system. GPU cards being used for computation only – not for display. No other activity going on in the system during measurements.

I had systems with 2 S1070/S2070 (i.e. 8 GPUs per machine) and didnt see this. Linux and no display attached.

Did you try other applications? Samples from the SDK?

eyal

What was the power setting on your system? Some people are saying the power setting is being set to adaptable rather than “maximum performance” – how can one verify if this is the cause? I don’t know how to make the setting “maximum performance” without using the driver gui, and I don’t have access to the gui. What is the command line or configuraiton file method for ensuring power setting is permanently set to maximum performance?

Please use nvidia-smi -a to check the Temperature of your boards while running and let us know.

If the temperature goes above 90C then you are not cooling enough and the boards will automatically clock back.

The reported temperature is 74 C – full details in attached nvidia_smi_a.txt file. I ran the nvidia_smi command just before and just after kernel executed; attached file is the output from running nvidia_smi just after execution of kernel.
nvidia_smi_a.txt (9.34 KB)