Tesla C2050 slower than GTX295!

Hi,

i recently purchased a Tesla C2050 to speed up an application i wrote. I’m using the SDK reduction kernel as a part of the application. Sadly, the reduction kernel needs more computation time on the Tesla as on a GTX295.

I’m searching an explanation!!!

Please help!

A C2050 has 144 Gb/s theoretical memory bandwidth with ECC off (less if ECC is on). A stock GTX 285 has 159 Gb/s. So on pure bandwidth limited applications where cache doesn’t come into play, the GTX285 should probably be faster than a C2050. A simple parallel reduction is probably memory bandwidth limited…

Hey, problem solved!

Timer problem. I forgot to sync the timer.

Thanks for your support!!!