Performance difference between C2050 and M2050 on the bandwidthTest example

I had the chance to make some tests on both Tesla C2050 and Tesla M2050. Looking at the bandwidthTest of the SDK, we have noticed some large differences between the two GPU yet mounted on the same system with the same drivers (195.36.15) and the same CUDA version (3.0).

The figures are (pinned memory):

C2050 Host to Device 5.8 GB/s
Device to Host 6.2 GB/s

M2050 Host to Device 4.2 GB/s
Device to Host 4.2 GB/s

Does somebody have seen similar difference and if so, have been able to explained or repair it ?

Thanks

I don’t have the machines to test it, but according to the specs they should have the same bandwidth…
Perhaps ECC is turned on for the M2050 while it’s turned off for the C2050?

N.

If I understand correctly, the M2050 is identical to the C2050, just with a passive airflow heatsink instead of shroud-and-blower.

So immediately I think that your two cards are not in identically performing PCIe slots. Perhaps one is x16 bandwidth and one is x8, or one slot’s bandwidth is shared via a switch with another device?

Can you swap the cards’s slots and test again?

What’s your motherboard?

Nico’s thought of ECC also makes very good sense, and that could also be it. Though I wouldn’t think that’d affect host to/from device bandwidth… device to device, yes, but not host.

I’m a bit puzzled by this. Why should ECC only matter for device to device transfers?

AFAIK ECC is performed by extra error-checking hardware at the I/O pins of the external DRAM interfaces to protect memory transactions against soft errors induced by EM interference and radiation.

Doesn’t it make sense that the data is also checked for those transient errors before it’s being transferred to the host?

N.

ECC on GF100 is not just in the GDDR5 RAM, but also in the GF100 chip’s own SRAM for caches and even registers.

And yes, it adds overhead as every transaction has extra bandwidth used for the parity bits.

So when copying from the device to the device, it is reasonable that these ECC checks would be occuring, lowering the bandwidth.

But for HOST transfers, the device’s memory speed is irrelevant. Even with ECC, it’s an order of magnitude faster than the only real bottleneck: PCIe bandwidth. So I would not expect ECC to affect host transfers.

However, I may be wrong which is why it’s worth checking.

Makes sense. Thanks for clearing that up.

N.

Thanks for the tip. Howerver disabling ECC does not change anything as suggested by SPWorley.