when did I reach max. possible speed? is there a way to know?

Louis_Coder · December 25, 2008, 3:15pm

Hi, can I somehow check if there’s still speed increase possible for my CUDA app?
Is there any tool like a profiler or something like this?

I mean I read the CUDA programming guide and they tell a lot about memory collisions,
memory latency and so on.

For example, I could program anything and don’t notice that there are around 600 cycles
memory latency because I used any bad memory accessing order (global, shared etc.).

How can I detect this?

Ocire · December 25, 2008, 4:43pm

there is a profiler available, just look in the announcements section.
you can also do some rough calculations, i.e. calculate, how many GB your kernel will transfer within device global memory and look if it gets to a bandwidth near the max. bandwidth of the card.
if not, you either have long computation times (check the performance guide of the dev. manual for ways to optimize this) or something is not coalesced. (the profiler will tell you the latter)
check, if all your transfers within the device are needed or if you could save some of them by using shared/texture/const mem.
check, if you have reduced the device-host memory transfers to a minimum.

in short: find the bottleneck, remove it, begin with searching again. ;-)

Louis_Coder · December 25, 2008, 4:55pm

Ok but wherefrom do I know if there IS a bottleneck at all? Only by the rough calculations?

I don’t have much money for an expensive profiler (I once heard they cost up to a few hundred $/Euro).

Ocire · December 25, 2008, 5:48pm

the cuda occupancy calculator and the visual profiler are all available for free, either here in the forums or at the cudaZone at nividia.com/cuda.
also take a look at the tutorial section of the cudaZone, there you’ll find some nice examples of how to optimize a kernel.

Louis_Coder · December 25, 2008, 6:01pm

For free? Kinda cool!
I’ll have a look at it. Thanks.

Louis_Coder · December 25, 2008, 6:02pm

For free? Kinda cool!
I’ll have a look at it. Thanks.

alex_dubinsky · December 26, 2008, 5:41pm

Make sure you don’t use the profiler on Vista.

On XP, it will tell you the number of conflicts, such as bank conflicts or divergence. On pre-G200 hardware it would tell you the number of uncoalesced accesses, which is probably the most important slowdown.

Topic		Replies	Views
How can I tell if my memory accesses are being coalesced? CUDA Programming and Performance	5	1247	June 23, 2009
Is there any tool which can tell my kernel is compute bound or memory bound CUDA Programming and Performance	7	6006	April 3, 2010
Runtinme occupancy CUDA Programming and Performance	5	1850	January 9, 2009
CUDA Visual Profiler Dies During Long Programs CUDA Programming and Performance	2	3424	August 5, 2010
Improving Cuda-kernels performance CUDA Programming and Performance	5	9370	February 10, 2009
Visual Profiler makes bandwidth 6x faster ?!? CUDA Programming and Performance	4	1168	February 18, 2015
Profiler speeding up my kernels? Nvidia employees please read Weird timing behavior during profiler CUDA Programming and Performance	6	5819	November 9, 2009
Visual profiler CUDA Programming and Performance	1	2596	October 3, 2011
visual profiler with compute capability 1.0 cards? CUDA Programming and Performance	9	5201	September 12, 2008
Profiling in a code line resolution CUDA Programming and Performance	7	7056	December 6, 2011

when did I reach max. possible speed? is there a way to know?

Related topics