Benchmarking problem

knmaheshy2k · December 29, 2008, 1:14pm

I’m trying to benchmark a particular implementation, but to compare the results with the existing ones, i need to do it in NVIDIA 8800 GTX. But I’ve a Tesla D870. It would be great if someone could throw some light on the factor of improvement.
These are the two processors which I would like to compare and get the factor of improvement in Tesla.

Processor -1 : Nvida 8800 GTX
Specifications

Frequency of processor cores 0.575 GHz
Shader clock 1.35 Ghz
Total Dedicated Memory 768MB GDDR3
Memory Speed 900MHz
Memory Interface 384-bit
Memory Bandwidth 86.4GB/sec

Processor - 2 : Nvidia Tesla
Specifications:

Frequency of processor cores 1.3GHz
Total Dedicated Memory 4GB GDDR3
Memory Speed 800MHz
Memory Interface 512-bit
Memory Bandwidth 102GB/sec

How much do you expect the speed improvement would be? Given that Tesla has higher clock rate but slower memory speed than 8800GTX, to compare the results I need some factor which has to be multiplied.

E.D_Riedijk · December 29, 2008, 1:29pm

I’m trying to benchmark a particular implementation, but to compare the results with the existing ones, i need to do it in NVIDIA 8800 GTX. But I’ve a Tesla D870. It would be great if someone could throw some light on the factor of improvement.

These are the two processors which I would like to compare and get the factor of improvement in Tesla.

Processor -1 : Nvida 8800 GTX

Specifications

Frequency of processor cores 0.575 GHz

Shader clock 1.35 Ghz

Total Dedicated Memory 768MB GDDR3

Memory Speed 900MHz

Memory Interface 384-bit

Memory Bandwidth 86.4GB/sec

Processor - 2 : Nvidia Tesla

Specifications:

Frequency of processor cores 1.3GHz

Total Dedicated Memory 4GB GDDR3

Memory Speed 800MHz

Memory Interface 512-bit

Memory Bandwidth 102GB/sec

How much do you expect the speed improvement would be? Given that Tesla has higher clock rate but slower memory speed than 8800GTX, to compare the results I need some factor which has to be multiplied.

You are writing the specs of the Tesla C1060. The Tesla D870 is 2x C870, which is pretty much 100% the same as a 8800GTX as far as I remember.

The C1060 is quite a lot faster than 8800GTX, it depends a bit, but the higher memory bandwidth and the doubled amount of registers helps me a lot.

seibert · December 29, 2008, 1:34pm

It looks like you’ve confused some of the specifications of the D870 with the newer GT200-based Tesla C1060. The older D870 was two Tesla C870 boards in an external enclosure. Each C870 was nearly identical to the 8800 GTX, but with 1.5 GB of memory and slightly slower clocked 384-bit memory bus. As far as I know, there never was a D870 with 4 GB of memory and 512-bit memory bus.

Assuming you do have the older Tesla card, the scaling from Tesla C870 performance to 8800 GTX will be somewhere between 1.0 (i.e. no change) and 1.125. The reason for that range is that floating point performance is identical between the C870 and the 8800 GTX, but the memory bandwidth is different. If your kernel is mostly memory-bound, then the scaling will be closer to 1.125. If it is mostly compute bound, it will be closer to 1.0.

(If you have a new Tesla C1060, then it will be very hard to estimate 8800 GTX speeds because a large number of things have changed, making the speed relation between the two boards depend very strongly on the details of your code.)

knmaheshy2k · December 29, 2008, 4:02pm

Thanks Riedijk

Mine is Tesla D870 - 2C870. So is it that if I achieve a remarkable increase in improvement as compared to 8800 GTX - then actually i’ve increased the efficiency?

knmaheshy2k · December 29, 2008, 4:06pm

It looks like you’ve confused some of the specifications of the D870 with the newer GT200-based Tesla C1060. The older D870 was two Tesla C870 boards in an external enclosure. Each C870 was nearly identical to the 8800 GTX, but with 1.5 GB of memory and slightly slower clocked 384-bit memory bus. As far as I know, there never was a D870 with 4 GB of memory and 512-bit memory bus.

Assuming you do have the older Tesla card, the scaling from Tesla C870 performance to 8800 GTX will be somewhere between 1.0 (i.e. no change) and 1.125. The reason for that range is that floating point performance is identical between the C870 and the 8800 GTX, but the memory bandwidth is different. If your kernel is mostly memory-bound, then the scaling will be closer to 1.125. If it is mostly compute bound, it will be closer to 1.0.

Thanks seibert.

Mine is Tesla D870 - 2xC870. My code is compute + Memory based and the improvement factor is somewhere around 1.2. So it it common?

E.D_Riedijk · December 29, 2008, 7:28pm

I think that can indeed be explained by difference in memory bandwidth.

Topic		Replies	Views
Tesla C2070 Performance Comparing Tesla C2070 performance to Geforce GTX CUDA Programming and Performance	4	2660	March 24, 2011
Fastest card for CUDA? CUDA Programming and Performance	14	10943	September 26, 2008
Tesla vs GeForce archs What makes the tesla better? CUDA Programming and Performance	8	18455	September 14, 2009
qudra fx 1700 VS tesla c1060 How much performance gain I can expect? CUDA Programming and Performance	3	2256	January 23, 2010
Scalability question CUDA Programming and Performance	3	9202	June 6, 2009
Tesla C1060 or GTX280 CUDA Programming and Performance	3	3228	February 9, 2009
Performance Difference between GTX 770 and Tesla K20m CUDA Programming and Performance	5	1100	August 4, 2017
GeForce 570 vs. Tesla c2050 CUDA Programming and Performance	3	1853	August 16, 2011
Tesla C870 Performance Question CUDA Programming and Performance	1	3497	July 14, 2008
Tesla Performance? CUDA Programming and Performance	1	8131	February 8, 2010

Benchmarking problem

Related topics