Fermi Card Performance Differences

Sabaron · November 17, 2010, 3:34pm

I’ve got 3 cards (M1060, M2050, GTX480) and I’m running benchmarks on all of them. I’m running a simple matrix multiply, basically the same as provided in the samples, making use of multiple streams. ECC is off on the M2050. I’m seeing about a 30% performance increase from the M2050 → GTX480 in just computation time, which is more than I’d expect for a 12% increase in memory bandwidth and 1 additional MP, but maybe that’s really on par?

What’s bothering me more is that as I scale up on the number of streams, the M2050 overtakes the GTX480. This is timing from the start of all streams copying to device, computing, and copying back. Computation alone the GTX480 always is faster, as expected. The host transfer times are similar so that alone isn’t it. So it seems strictly related to overlapping of multiple streams. But why isn’t the GTX480 seeing the same gains? It’s the same compute capability and as far as I know very similar architecture? The executable I’m running is identical between the platforms.

Thanks!

Sabaron · November 17, 2010, 3:34pm

I’ve got 3 cards (M1060, M2050, GTX480) and I’m running benchmarks on all of them. I’m running a simple matrix multiply, basically the same as provided in the samples, making use of multiple streams. ECC is off on the M2050. I’m seeing about a 30% performance increase from the M2050 → GTX480 in just computation time, which is more than I’d expect for a 12% increase in memory bandwidth and 1 additional MP, but maybe that’s really on par?

What’s bothering me more is that as I scale up on the number of streams, the M2050 overtakes the GTX480. This is timing from the start of all streams copying to device, computing, and copying back. Computation alone the GTX480 always is faster, as expected. The host transfer times are similar so that alone isn’t it. So it seems strictly related to overlapping of multiple streams. But why isn’t the GTX480 seeing the same gains? It’s the same compute capability and as far as I know very similar architecture? The executable I’m running is identical between the platforms.

Thanks!

seibert · November 17, 2010, 5:17pm

The GTX 480 has both an additional MP relative to the M2050, and the shader clock rate is boosted from 1.15 GHz to 1.4 GHz. That adds up to a 30% improvement if you are completely compute bound.

My guess (based on this post The Official NVIDIA Forums | NVIDIA) is that you are seeing the benefit of the extra DMA engine on Tesla. The GeForce can overlap a single device-to-host or host-to-device transfer with computation on different streams, but the M2050 has two DMA engines, so it can perform transfers in both directions while running calculations.

seibert · November 17, 2010, 5:17pm

The GTX 480 has both an additional MP relative to the M2050, and the shader clock rate is boosted from 1.15 GHz to 1.4 GHz. That adds up to a 30% improvement if you are completely compute bound.

My guess (based on this post The Official NVIDIA Forums | NVIDIA) is that you are seeing the benefit of the extra DMA engine on Tesla. The GeForce can overlap a single device-to-host or host-to-device transfer with computation on different streams, but the M2050 has two DMA engines, so it can perform transfers in both directions while running calculations.

Sabaron · November 17, 2010, 6:47pm

Completely forgot about that extra DMA engine! That is definitely it, thanks! And thanks for the info about the shader clock rate, between the two it sounds like the difference I noted. Appreciate your insight.

Sabaron · November 17, 2010, 6:47pm

Completely forgot about that extra DMA engine! That is definitely it, thanks! And thanks for the info about the shader clock rate, between the two it sounds like the difference I noted. Appreciate your insight.

Sarnath · November 18, 2010, 5:15am

Good Catch!

Sarnath · November 18, 2010, 5:15am

Good Catch!

Topic		Replies	Views
Disappointed performance using C2050 CUDA Programming and Performance	20	7916	September 2, 2010
GTX480 performance on different motherboards performance differs on AMD and INTEL motherboards CUDA Programming and Performance	15	18486	June 7, 2010
gtx480 vs C2050 faster or slower? CUDA Programming and Performance	2	1161	August 5, 2011
GTX 580 is not as good as GTX480 for CUDA ? CUDA Programming and Performance	23	4045	November 7, 2010
GTX 480 - performance CUDA Programming and Performance	8	6898	June 9, 2010
Comparing C1060, GTX470, GTX480 and C2050 Benchmark results of the Fermi Cards and Tesla generation CUDA Programming and Performance	9	25974	November 4, 2010
Tesla C2070 Performance Comparing Tesla C2070 performance to Geforce GTX CUDA Programming and Performance	4	2607	March 24, 2011
Double precision: GTX 465, GTX 480 and C2050 CUDA Programming and Performance	16	3872	September 9, 2010
Tesla S2050 performance double precision performance too low CUDA Programming and Performance	42	29389	December 8, 2010
Need help to choose either the gtx 295 or the gtx 480 for massive Lattice Boltzman simulations CUDA Programming and Performance	10	1395	December 9, 2010

Fermi Card Performance Differences

Related topics