CPU cores vs GPUs

Hi,
Im thinking of putting a 4xGTX295 machine. I’ve tested a quad core with 2 GTX295 and it seems ok for now.
What I was told that in order to have 4 PCI lines to host 4 GTX295, I can only use one CPU quad core, therefore
each core will have to handle two GPUs (or one dual GTX).
Has anyone tried this ? what do you think? I guess it would have been better to have one core per gpu but…

Also, has anyone tested the new GTX285? how is it compared to the GTX280 or GTX295?

Any comments/thoughts are more then welcomed :)

thanks
eyal

I dont think the number of cores or cpus have any direct relation with the number of GPUs.

Each GPU needs to be controlled by one thread of execution. Having 4 cores help to program 4 GPUs because all 4 threads can run simultaneously and so can be faster… THats all.

I might be wrong though…

There are a couple of issues here:

A. Is there a nVidia recommendation?

B. Has someone tested such a thing in production and noted a degragation in performance?

C. Obviously I can open 100 threads per core, but that probably won’t be that good even if GPUs are not involved.

D. In my code I have a lot of CPU/GPU ping-pong and therefore I’m a bit afraid that 8 CPU threads (one per GPU)

on only 4 cores will degrade the overall performance.

In anycase once the machine will arrive, I’ll update the forum on my findings :)

eyal

http://www.nvidia.com/object/tesla_build_your_own.html

Yes. There are a posts on the forums. There are more examples of background processes slowing the performance of a CUDA app significantly.

Very likely.

here is comparison of key parameters for gtx280/gtx285/gtx295 according to official specs

GPU		  CORE FREQ(MHz)	 SHADER FREQ(MHz)	   MEMORY FREQ(MHz)	GLOBALMEM_BANDWIDTH(GB/s)

GTX280	  602				 1296			   1107  (512bit)			141.7

GTX295	  576				 1242			   999	(448-bit)		2*111.9

GTX285	  648				 1476			   1242  (512bit)			159

GTX285OC  702								   1323  (512bit)			169.3

sources : nvidia.com, evga.com

Actually, this depends significantly on kernels. If you’re firing hundreds of kernel invocations per second (i.e. each kernel takes only few milliseconds) then high load in background processes is a problem. You will also likely to see performance degradation if #CPUs < #GPUs in this case.

If your kernels run for longer time, i.e. for second or so, you can play with CU_CTX_SCHED_YIELD flag and it will likely help you to avoid performance degradation.

Use blocking sync in 2.2 if you’re worried about CPU utilization. See explanation here.