SDK reduction kernel runs slow on Tesla C2050 compared to GTX295


i use a Tesla C2050 for scientific computation. One part of my program uses the sdk reduction kernel. Before i got the Tesla i used a GTX295.

The reduction kernel on the Tesla requieres about 4ms (1MP data) and the GTX295 only about 0.5ms.

Can anybody tell me why? Does anybody got a clue?

Reduction is almost certainly memory bandwidth bound. Do you have ECC enabled on the C2050?

No! ECC is disabled.