Profiler - CPU Time

Flolo · July 25, 2008, 1:43pm

Here I got a little excerpt from the cudaprof

(functionname, calls, GPU time, CPU time, %)

matrix_vector_multiply_generated    400  206162  4566 54.44

matrix_vector_multiply              400  54548.8 4495 14.4

and what I find strange, the GPU Time is smaller than cpu time. How can this happen?

(This is machine at work GTX280, 32Bit 2.0b CUDA environment ,

at home with 64bit 1.1 CUDA environment on GTX8800 512 the output works as expected, with CPU time is approx. GPU time + 20 us overhead per call).

Anybody knows what happens?

Flolo · July 29, 2008, 9:00am

Now I installed on my 64-Bit machine the newest cuda + cudaprof, and now there the cpu time doesnt work either. The numbers are utterly nonsense and profiling of cpu time is not possible. Am I the only one with this problem?

Edit: Its not a problem of the visual profiler, the wrong cputime is also shown with cli profiling.

Flolo · July 29, 2008, 2:26pm

I found a solution, how to make the buggy behaviour disappear:
enable additional signals

If I enable e.g. the gld signals, the reportet cpu time seems to be correct.

E.D_Riedijk · July 29, 2008, 5:43pm

cpu time = gpu time + overhead.

senorbum · July 29, 2008, 5:56pm

Yeah, this isn’t a bug. The CPU is treated as waiting for the kernel to finish as if it isn’t asynchronous. So basically what E.D. said is what is going on.

Flolo · July 29, 2008, 10:59pm

I know that cpu time = gpu time + approx. 20 us overhead
and that is exactly what I am expecting and what I get with old Cuda version.
My problem is that I get this result with the new version only when enabling additional signals.
If you look at the numbers of my first posting
i.e. e.g. gpu time 206162 and cpu time 4566 which cant be.

E.D_Riedijk · July 30, 2008, 5:23am

you still wrote that gpu time is smaller than cpu time, that is what got us on the wrong foot ;)

I have no idea though why this happens. Are you sure you don’t have the columns backwards? I always use the visual profiler. It would be an explanation:

when not too many signals your program gets run once → first kernel call overhead
when enough signals are selected the profiler needsto run it twice or three times and I guess it will take the CPU time of one of the last runs.

Flolo · July 31, 2008, 6:50am

My fault - I really meant cpu < gpu

I analyzed it a bit more: Even if I enable just one additional signal in the visual profiler (like “gld uncoalesced”) it works - and this performs just one run. If I enable no additional signal (just use the default timestamps) the cpu time is bogus.

gatoatigrado · August 10, 2008, 9:04pm

can someone explain what
“gld [un]coalesced”, “gst [un]coalesced”, “local load”, “local store”, “branch”, “divergent branch”, “warp serialize” and “cta launched” mean, and what values are good and which mean there should be optimization? Is there a manual for this somewhere? The memcpy time, etc. is useful, but I think I could have just as well measured it with a timer…

Topic		Replies	Views
Profiler, GPU/CPU time CUDA Programming and Performance	0	2553	January 29, 2009
Profiler Times just need some info CUDA Programming and Performance	4	4531	June 16, 2010
cuda visual profiler CUDA Programming and Performance	12	8167	July 30, 2008
Profiler Interpretation of profiler results CUDA Programming and Performance	3	5867	July 3, 2007
Is The cuda profiler accurate? CUDA Programming and Performance	1	2194	August 9, 2008
What is GPU&CPU time in profiler? instrumentation overhead included? CUDA Programming and Performance	0	1110	September 19, 2008
On timing and timer CUDA Programming and Performance	7	4190	July 15, 2009
Timing with cuda profiler CUDA Programming and Performance	2	2911	December 6, 2008
Interpreting OpenCL Visual Profiler Results CUDA Programming and Performance	4	2235	June 10, 2010
Profiler speeding up my kernels? Nvidia employees please read Weird timing behavior during profiler CUDA Programming and Performance	6	5816	November 9, 2009

Profiler - CPU Time

Related topics