URGENT: Weird CUDA profiler results...need help with analysis

Aditi · June 18, 2009, 3:52am

Hi Everyone,

I have this cuda profiler output which is baffling me. I need help with understanding. To give a basic introduction, I am working on GTX 280 in windows and am using visual cuda profiler version 1.1.08. My .cu code has 3-kernels…none using shared memory for now. Here is the profiler output for reference. I have a few weird observations:

Why does the third kernel show “zero” instructions, gld_coherence and gst_coherence? It has 1 block of 300 threads (as against other kernels with lot more thread-blocks as you can see from the profiler output) and performs quite some work and multiple global memory accesses at every step. Even when I select only “instructions” in any profiling session, it still gives zero instructions, gld and gst for kernel-3.
Why the number of blocks is listed under gridSizeY instead of gridSizeX? The current output with Bx=blockIdx.x is correct. When I change it to By=blockIdx.y all through the code, I get a crapy output. Why is it so?
I understand that though gld/gst-incoherence is hidden in the visual profiler for newer hardware like GTX200 series (since they claim to take care of the coalesced memory accesses), incoherence/uncoalesced memory access still exists (I am using double precision and NO shared memory for now). Is it true? If so, what is the way to find it out?

Will be thankful if someone can please share some info/experience with me!! I have very limited speed-up and no optimization is helping me!!

Thanks & regards,

Aditi

Sarnath · June 18, 2009, 5:10am

Did u check ur kernel for errors? (but then I think log is in-complete for bad kernels…)

May b, the driver is NOT profiling the correct multi-processor in which the kernel is being executed… :-(

Since 1 block runs only in 1 MP and the profiler is done only for 1 MP, may b the driver is profiling some MP in which the code is NOT run…

I would be surprised if thats the case. Driver cant be so dumb.

Try using the command line profiler. Set CUDA_PROFILE_CONFIG to the config file name and enter what u want to watch in that config file - one option per line. Note that there is a limit of 4 counters for the profiler (4 is applicable only for those profiler options which depend on hardware counters)

Topic		Replies	Views
CUDA Profiler documentation Few questions and some interesting facts CUDA Programming and Performance	5	6141	July 20, 2009
Cuda profiler options CUDA Programming and Performance	6	1764	June 9, 2009
cuda profiler cudaMemcpy linux cuda visual profiler breaks working program CUDA Programming and Performance	2	770	December 17, 2010
cuda profiler reporting almost all zeros CUDA Programming and Performance	3	3680	November 18, 2010
cuda profiler, 0 instructions. CUDA Programming and Performance	1	1565	July 29, 2008
Unexpected Profiler output, zeros for all global read/write CUDA Programming and Performance	3	1881	December 23, 2008
Cuda Profiler Issues CUDA Programming and Performance	1	9302	September 10, 2009
Profiler coalescing counters On a GTX 260 CUDA Programming and Performance	4	2368	August 13, 2008
Cuda profiler: global memory counters are 2x the number of operations CUDA Programming and Performance	1	792	May 28, 2012
Does the cuda profiler give always correct numbers? gld, gst... CUDA Programming and Performance	1	2860	July 13, 2010

URGENT: Weird CUDA profiler results...need help with analysis

Related topics