Tesla C870 slower than GForce 9600 GT ?

twizol · May 22, 2010, 5:23pm

Hello everybody,

I’m currently developing a CUDA application and testing it on two different machines : my laptop equipped with NVIDIA Geforce 9600 GT and a machine using Tesla Card C870.
As a recall, the Tesla owns much more cores and Multi-Processors and enjoys a very higher computation rate in terms of GFLops.
The sample programs provided by SDK confirm these features. (clock, reduction, …)
my program can be summarized as 3 kernels using for all of them blocks of dimension (16,16).

The result is not what i expect. The execution time is twice more on the tesla card. (it is accurately twice)
The Cuda Profiler indicates one main feature on which we could pay attention : there’s 0 divergent branch for all the kernel launched on Tesla, where as one the nvidia we can count from 600 to 7400 of this kind. The instruction throughput is also higher on the 9600 GT! The warp size is 32 for both cards.

My device code holds different numbers of branching.

Are these results coherent ? Could it be due to the devices architecture ?

I precise also i run on Linux, my processor is 64 bits and i’ve tried to set both 32 and 64 bits version of Linux, it does not change the result.

Thanks in advance for your help !

tmurray · May 22, 2010, 10:12pm

Look at the number of coalesced accesses.

tmurray · May 22, 2010, 10:12pm

Look at the number of coalesced accesses.

twizol · May 23, 2010, 12:11am

Thanks for your answer.

i will check it, but what is this value supposed to bring as information ?
If the number of coalesced accesses is higher on Tesla, what does it mean ?

Thanks

twizol · May 23, 2010, 12:11am

Thanks for your answer.

i will check it, but what is this value supposed to bring as information ?
If the number of coalesced accesses is higher on Tesla, what does it mean ?

Thanks

twizol · May 23, 2010, 11:13am

If we assume that my code should focus more on the coalesced access to global memory, why the difference would be significant between the two cards ?

twizol · May 23, 2010, 11:13am

If we assume that my code should focus more on the coalesced access to global memory, why the difference would be significant between the two cards ?

Topic		Replies	Views
From low end GPUs to high end GPUs Moving from 9600GT to Tesla T10 provides no improvement, why ? CUDA Programming and Performance	24	17469	June 8, 2010
Tesla1060 vs GS8600 CUDA Programming and Performance	3	1620	March 11, 2010
Tesla C2050 slower than GeForce 8800? CUDA Programming and Performance	14	21026	April 20, 2011
Tesla vs GeForce archs What makes the tesla better? CUDA Programming and Performance	8	18403	September 14, 2009
Tesla C1060 or GTX280 CUDA Programming and Performance	3	3195	February 9, 2009
tesla-geforce?which card? why? simple questions CUDA Programming and Performance	1	2631	June 18, 2008
Tesla C2070 vs. GX2 speed test CUDA Programming and Performance	4	1453	June 23, 2011
Real differences between GeForce GTX and Tesla Is there more than what's stated on the specs pag CUDA Programming and Performance	7	21127	May 9, 2009
How to compute performance in GFLOPS ? CUDA Programming and Performance	25	12238	November 17, 2008
Strange result Comparing a Tesla C1060 against GTS 250 CUDA Programming and Performance	16	2104	December 4, 2010

Tesla C870 slower than GForce 9600 GT ?

Related topics