Tesla C1060 vs GTX 480 Double precision performance

Is it worthwhile to switch the Tesla C1060 cards to GTX 480 in terms of double precision performance?

For single precision, GTX 480 runs over 50% faster than Tesla C1060 from my experience. I think it is worthwhile to switch the cards for single precision calculations. But I don’t know the case of double precision. I’m afraid that the calculation results become unacceptable despite a possibly faster calculation speed.

Thanks

Is it worthwhile to switch the Tesla C1060 cards to GTX 480 in terms of double precision performance?

For single precision, GTX 480 runs over 50% faster than Tesla C1060 from my experience. I think it is worthwhile to switch the cards for single precision calculations. But I don’t know the case of double precision. I’m afraid that the calculation results become unacceptable despite a possibly faster calculation speed.

Thanks

I think that if you’re application is bandwidth bound it doesn’t make a huge difference between a GTX480 and a quadro/tesla card since it will be the bandwidth and not the compute units that is the bottleneck. This is often the case in many applications. Furthermore the GTX480 has a higher bandwidth since it’s clocked higher…

I think that if you’re application is bandwidth bound it doesn’t make a huge difference between a GTX480 and a quadro/tesla card since it will be the bandwidth and not the compute units that is the bottleneck. This is often the case in many applications. Furthermore the GTX480 has a higher bandwidth since it’s clocked higher…

I think that if you’re application is bandwidth bound it doesn’t make a huge difference between a GTX480 and a quadro/tesla card since it will be the bandwidth and not the compute units that is the bottleneck. This is often the case in many applications. Furthermore the GTX480 has a higher bandwidth since it’s clocked higher…

I might have this wrong, but I thought NVIDIA had disabled double precision on all the fermi GTX cards to encourage you to buy the C2060 tesla.

I might have this wrong, but I thought NVIDIA had disabled double precision on all the fermi GTX cards to encourage you to buy the C2060 tesla.

I might have this wrong, but I thought NVIDIA had disabled double precision on all the fermi GTX cards to encourage you to buy the C2060 tesla.

No, they just disabled most of the double precision units.

No, they just disabled most of the double precision units.

No, they just disabled most of the double precision units.

Not disabled, just crippled. The full Fermi does DP at 1/2 the rate of SP, and the GeForce Fermi cards do DP at 1/8 the rate of SP, which is just like the GT200 Tesla cards. So, the overall increase in # of CUDA cores in the GeForce Fermi chips gives you a net improvement in double precision over the last generation Tesla, even with the performance cap.

Not disabled, just crippled. The full Fermi does DP at 1/2 the rate of SP, and the GeForce Fermi cards do DP at 1/8 the rate of SP, which is just like the GT200 Tesla cards. So, the overall increase in # of CUDA cores in the GeForce Fermi chips gives you a net improvement in double precision over the last generation Tesla, even with the performance cap.

Not disabled, just crippled. The full Fermi does DP at 1/2 the rate of SP, and the GeForce Fermi cards do DP at 1/8 the rate of SP, which is just like the GT200 Tesla cards. So, the overall increase in # of CUDA cores in the GeForce Fermi chips gives you a net improvement in double precision over the last generation Tesla, even with the performance cap.

Ideally you’d go to a Tesla 2050 or 2070, where you’d get full DP performance as well as the Tesla vs. consumer intangibles. I’ll assume that’s not an option since you didn’t ask about it, so let’s ignore all those discussions.

As Jimmy said, if you’re completely bandwidth bound, and have well optimized kernels, then probably little benefit. However you’re getting a decent improvement on your SP code so this may not be true.

On my CFD-type code, we’re definitely getting a speedup in double precision when comparing an S1070 to a standard-clocked GTX 470. About 1.5 - 1.6x for most kernels. A GTX 480 should be faster in both memory and GPU clock.

Ideally you’d go to a Tesla 2050 or 2070, where you’d get full DP performance as well as the Tesla vs. consumer intangibles. I’ll assume that’s not an option since you didn’t ask about it, so let’s ignore all those discussions.

As Jimmy said, if you’re completely bandwidth bound, and have well optimized kernels, then probably little benefit. However you’re getting a decent improvement on your SP code so this may not be true.

On my CFD-type code, we’re definitely getting a speedup in double precision when comparing an S1070 to a standard-clocked GTX 470. About 1.5 - 1.6x for most kernels. A GTX 480 should be faster in both memory and GPU clock.