gtx480 vs C2050 faster or slower?

Hi all,

in my cuda application, i measure the execution time of different parts of code, and in particular of the kernels of my code through time.h library.

When I run my application on the gtx480 and C2050 architectures it seems the C2050 is little faster than gtx480 comparing total execution time of the hole application BUT comparing execution time of kernels in gtx480 and C2050 the first ones seem to be faster!!!

For execution time of kernels the transeferring in memory from device to host and/or from host to device is not included.

As a consequence one may think that this transfering maybe makes the difference! But how can that be? The C2050 has lower bandwidth compared to gtx480.

Summarizing, total application faster in C2050 but all kernels slower in gtx480 … ???

Can anybody suggest any possible explanation?

Thank you in advance.

The C2050 has two DMA engines and thus can overlap transfers in both directions, the GTX 480 has only one enabled and cannot.

Are you using the GTX480 for video output? If so, you need to remember that you have a little drawback because the card has to swtich between computing and graphics mode. Even this being a very fast switch, it may influence in your final execution time.