i use a Tesla C2050 for scientific computation. One part of my program uses the sdk reduction kernel. Before i got the Tesla i used a GTX295.
The reduction kernel on the Tesla requieres about 4ms (1MP data) and the GTX295 only about 0.5ms.
Can anybody tell me why? Does anybody got a clue?