Profiling a computationally bound kernel

bluebit · May 19, 2009, 12:18pm

Hey guys

Since the CUDA profiler lacks detailed profiling information for each function, I was wondering how accurate it is to:

Switch to EMUDEBUG
Run the program with the Visual Studio 2008 team system performance wizard
Analyze the results, extracting most time spent and greediest routines

Since its not memory bound (no thread sync whatsoever or shared memory) surely the results should be fairly accurate? Maybe not 100% accurate but at least I can tell which parts of the code need the most optimisation…

Thoughts?

eyalhir74 · May 19, 2009, 12:40pm

Hi,

I doubt it would help. What I do and it rarely fails is this (whether you’re computational or bandwidth bounded):

- start from a nearly empty kernel (only the setup code, indexes calcs, etc...) - measure the time.

- gradually open lines of code, till you find where you spend most of the time

- do it until you're happy with the results (or can't get any better - and ask again in the forums :) )

The most important thing is to make sure your kernel doesnt get optimized out by the compiler and that you measured

your kernel time correctly (i.e. use cudaThreadSync after the kernel invocation, kernel is not optimized out and ran fine),

either use your: hand clock, clock(), cuda events or cutil methods (sorry tmurray - they are just the easiest ;) )

hope that helps

eyal

Topic		Replies	Views
Techniques for Kernel Optimization CUDA Programming and Performance	1	5732	July 29, 2010
Kernel Launch Time (CPU Time) Reported in Visual Profiler how to optimize kernel launch CUDA Programming and Performance	0	3725	January 13, 2011
Optimisation using Visual profiler Some guess I would like to discuss with you CUDA Programming and Performance	5	1616	April 10, 2012
Kernel Launch Time (CPU Time) Reported in Visual Profiler how to optimize kernel launch CUDA Programming and Performance	1	683	July 7, 2011
Is there any tool which can tell my kernel is compute bound or memory bound CUDA Programming and Performance	7	6006	April 3, 2010
visual studio performance profiler on CUDA code CUDA Programming and Performance	1	6919	March 20, 2008
analysis inside kernel CUDA Programming and Performance	2	1434	July 2, 2012
Profiling in a code line resolution CUDA Programming and Performance	7	7056	December 6, 2011
How to measure time in cuda kernel ...? [CUDA 4.0] CUDA Programming and Performance	2	1276	May 7, 2013
Visual profiler CUDA Programming and Performance	1	2596	October 3, 2011

Profiling a computationally bound kernel

Related topics