Benchmarking problem

I’m trying to benchmark a particular implementation, but to compare the results with the existing ones, i need to do it in NVIDIA 8800 GTX. But I’ve a Tesla D870. It would be great if someone could throw some light on the factor of improvement.
These are the two processors which I would like to compare and get the factor of improvement in Tesla.

  1. Processor -1 : Nvida 8800 GTX
    Specifications

Frequency of processor cores 0.575 GHz
Shader clock 1.35 Ghz
Total Dedicated Memory 768MB GDDR3
Memory Speed 900MHz
Memory Interface 384-bit
Memory Bandwidth 86.4GB/sec

  1. Processor - 2 : Nvidia Tesla
    Specifications:

Frequency of processor cores 1.3GHz
Total Dedicated Memory 4GB GDDR3
Memory Speed 800MHz
Memory Interface 512-bit
Memory Bandwidth 102GB/sec

How much do you expect the speed improvement would be? Given that Tesla has higher clock rate but slower memory speed than 8800GTX, to compare the results I need some factor which has to be multiplied.

You are writing the specs of the Tesla C1060. The Tesla D870 is 2x C870, which is pretty much 100% the same as a 8800GTX as far as I remember.

The C1060 is quite a lot faster than 8800GTX, it depends a bit, but the higher memory bandwidth and the doubled amount of registers helps me a lot.

It looks like you’ve confused some of the specifications of the D870 with the newer GT200-based Tesla C1060. The older D870 was two Tesla C870 boards in an external enclosure. Each C870 was nearly identical to the 8800 GTX, but with 1.5 GB of memory and slightly slower clocked 384-bit memory bus. As far as I know, there never was a D870 with 4 GB of memory and 512-bit memory bus.

Assuming you do have the older Tesla card, the scaling from Tesla C870 performance to 8800 GTX will be somewhere between 1.0 (i.e. no change) and 1.125. The reason for that range is that floating point performance is identical between the C870 and the 8800 GTX, but the memory bandwidth is different. If your kernel is mostly memory-bound, then the scaling will be closer to 1.125. If it is mostly compute bound, it will be closer to 1.0.

(If you have a new Tesla C1060, then it will be very hard to estimate 8800 GTX speeds because a large number of things have changed, making the speed relation between the two boards depend very strongly on the details of your code.)

Thanks Riedijk

Mine is Tesla D870 - 2C870. So is it that if I achieve a remarkable increase in improvement as compared to 8800 GTX - then actually i’ve increased the efficiency?

Thanks seibert.

Mine is Tesla D870 - 2xC870. My code is compute + Memory based and the improvement factor is somewhere around 1.2. So it it common?

I think that can indeed be explained by difference in memory bandwidth.