Hi,
i recently purchased a Tesla C2050 to speed up an application i wrote. I’m using the SDK reduction kernel as a part of the application. Sadly, the reduction kernel needs more computation time on the Tesla as on a GTX295.
I’m searching an explanation!!!
Please help!
A C2050 has 144 Gb/s theoretical memory bandwidth with ECC off (less if ECC is on). A stock GTX 285 has 159 Gb/s. So on pure bandwidth limited applications where cache doesn’t come into play, the GTX285 should probably be faster than a C2050. A simple parallel reduction is probably memory bandwidth limited…
Hey, problem solved!
Timer problem. I forgot to sync the timer.
Thanks for your support!!!