I’m trying to benchmark a particular implementation, but to compare the results with the existing ones, i need to do it in NVIDIA 8800 GTX. But I’ve a Tesla D870. It would be great if someone could throw some light on the factor of improvement.
These are the two processors which I would like to compare and get the factor of improvement in Tesla.
Processor -1 : Nvida 8800 GTX
Frequency of processor cores 0.575 GHz
Shader clock 1.35 Ghz
Total Dedicated Memory 768MB GDDR3
Memory Speed 900MHz
Memory Interface 384-bit
Memory Bandwidth 86.4GB/sec
Processor - 2 : Nvidia Tesla
Frequency of processor cores 1.3GHz
Total Dedicated Memory 4GB GDDR3
Memory Speed 800MHz
Memory Interface 512-bit
Memory Bandwidth 102GB/sec
How much do you expect the speed improvement would be? Given that Tesla has higher clock rate but slower memory speed than 8800GTX, to compare the results I need some factor which has to be multiplied.
It looks like you’ve confused some of the specifications of the D870 with the newer GT200-based Tesla C1060. The older D870 was two Tesla C870 boards in an external enclosure. Each C870 was nearly identical to the 8800 GTX, but with 1.5 GB of memory and slightly slower clocked 384-bit memory bus. As far as I know, there never was a D870 with 4 GB of memory and 512-bit memory bus.
Assuming you do have the older Tesla card, the scaling from Tesla C870 performance to 8800 GTX will be somewhere between 1.0 (i.e. no change) and 1.125. The reason for that range is that floating point performance is identical between the C870 and the 8800 GTX, but the memory bandwidth is different. If your kernel is mostly memory-bound, then the scaling will be closer to 1.125. If it is mostly compute bound, it will be closer to 1.0.
(If you have a new Tesla C1060, then it will be very hard to estimate 8800 GTX speeds because a large number of things have changed, making the speed relation between the two boards depend very strongly on the details of your code.)