when did I reach max. possible speed? is there a way to know?

Hi, can I somehow check if there’s still speed increase possible for my CUDA app?
Is there any tool like a profiler or something like this?

I mean I read the CUDA programming guide and they tell a lot about memory collisions,
memory latency and so on.

For example, I could program anything and don’t notice that there are around 600 cycles
memory latency because I used any bad memory accessing order (global, shared etc.).

How can I detect this?

there is a profiler available, just look in the announcements section.
you can also do some rough calculations, i.e. calculate, how many GB your kernel will transfer within device global memory and look if it gets to a bandwidth near the max. bandwidth of the card.
if not, you either have long computation times (check the performance guide of the dev. manual for ways to optimize this) or something is not coalesced. (the profiler will tell you the latter)
check, if all your transfers within the device are needed or if you could save some of them by using shared/texture/const mem.
check, if you have reduced the device-host memory transfers to a minimum.

in short: find the bottleneck, remove it, begin with searching again. ;-)

Ok but wherefrom do I know if there IS a bottleneck at all? Only by the rough calculations?

I don’t have much money for an expensive profiler (I once heard they cost up to a few hundred $/Euro).

the cuda occupancy calculator and the visual profiler are all available for free, either here in the forums or at the cudaZone at nividia.com/cuda.
also take a look at the tutorial section of the cudaZone, there you’ll find some nice examples of how to optimize a kernel.

For free? Kinda cool!
I’ll have a look at it. Thanks.

For free? Kinda cool!
I’ll have a look at it. Thanks.

Make sure you don’t use the profiler on Vista.

On XP, it will tell you the number of conflicts, such as bank conflicts or divergence. On pre-G200 hardware it would tell you the number of uncoalesced accesses, which is probably the most important slowdown.