What do you understand by CPU time? CPU time, computational load, cuda prof

joanisaac · July 11, 2008, 1:51am

Hi,

I’m trying to figure out what is the computation load of a program that computes speech processing features. For the moment the program takes and audio file and computes the power spectrum based on a FFT execution (with a scalable number of points).

This is the sequence of commands I use:

1-Init fftplan (just once at the beginning)
2-malloc memsize = sizeof(float)*fftSize on the GPU
3-malloc sizeof(cufftComplex)*fftSize on the GPU

4-for each frame:

4.a- copy memsize data from CPU to GPU
4.d- cufftExecR2C from float* to cufftComplex* on the GPU
4.e- run a kernel that computes power spec from cufftComplex* to float* on the GPU
4.f- copy memory back from the GPU

5-free all memory allocations and destroy plan

I’m using the visual profiler and to obtain this table (with a FFT of 512 points)

#calls    gpu time    cpu time    %GPU time

r2c_radix2_sp 128433 13,9722 17,3196 73,1400
cu_powerSpec 128433 2,0796 16,4702 10,8800
memcopy 128433 3,0513 15,9700

and for 1024 points:

r2c_radix4_sp 128433 18,53 17,14 76,64
cu_powerSpec 128433 1,97 16,33 8,14
memcopy 128433 3,68 15,2

(Note that the second and third column are the averaged us).

My questions.

What is CPU time? CUDA_Profiler_2.0.txt says:

"The ‘gputime’ and ‘cputime’ labels specify the actual chip
execution time and the driver execution time (including gputime),
respectively. Note that all times are in microseconds. "

In the case of FFT, all data is already on the GPU. So, how come I have
CPU usage for this function?

If the CPU time includes the GPU time, how come the CPU time
is higher r2c_radix4_sp when using 1024 points?

Why memcopy doesn’t differentiate between CPU usage and GPU?
Why cudaMalloc doesn’t appear on the table?

Thanks,

Joan

pstach · July 11, 2008, 2:05am

In your host code are you using cudaMemcpy, or cudaThreadSynchronize? Both grind on the device until all threads have completed. Its the equivalent of putting a while loop around a non-blocking read(2) call.

joanisaac · July 11, 2008, 2:29am

I’m using cudaMemcpy in steps

4.a- and 4.f. Not using any cudaThreadSynchronize.

joanisaac · July 11, 2008, 2:34am

But this doesn’t answer what is CPU time for each function…does it?

Sarnath · July 11, 2008, 5:19am

Its documented in the cuda profiler.txt

The time includes gpu time as well – thats my remembrance

joanisaac · July 11, 2008, 5:52am

But I insist, how come for the 1024 points FFT the time of the CPU time is lower that the GPU time ?

How do you define the CPU time? The time of …?

Sarnath · July 11, 2008, 7:33am

This is not an answer 2 ur question…

but, I have seen some discrepancy in this time of the order 200 microsecs – is my remembrance…

GPU time is mesaured by counters inside the GPU. The CPU time is measured from the CPU - which could include interrupt time etc etc… So, therez some dilly dallying there…

MisterAnderson42 · July 11, 2008, 11:58am

Because the driver must initialize the grid, kernel arguments, bind textures, etc… and copy the configured data to the card before launching the kernel. This accounts for cputime > gputime.

I don’t know. There are some known bugs with the profiler in CUDA 2.0, but if I recall correctly they related to the timestamp field. If you can post a minimal code that demonstrates the problem, NVIDIA is usually very good about checking it out and filing a bug report in their system.

Presumably because the driver overhead of setting up the DMA transfer is minimal so cputime=gputime. I really don’t know.

It never has.

joanisaac · July 11, 2008, 4:19pm

So if I understand correctly the times I obtained are not accurate, and this error of 200 microseconds

would make the CPU time lower than the GPU time. Is that correct?

Topic		Replies	Views
Profiler Times just need some info CUDA Programming and Performance	4	4564	June 16, 2010
Analysis of CUDA Visual Profiler Output CUDA Programming and Performance	2	1889	October 6, 2008
How to explain the performance difference? CUDA Programming and Performance	7	3545	March 26, 2008
cuda visual profiler CUDA Programming and Performance	12	8220	July 30, 2008
Profiler - CPU Time CUDA Programming and Performance	8	6057	August 10, 2008
Timing with cuda profiler CUDA Programming and Performance	2	2939	December 6, 2008
overall time consumption computation how to compute how much time my GPU code is consuming ? CUDA Programming and Performance	0	1125	May 18, 2009
cpu and gpu time in cuda profiler CUDA Programming and Performance	0	991	December 4, 2010
Profiler, GPU/CPU time CUDA Programming and Performance	0	2572	January 29, 2009
Profiler Interpretation of profiler results CUDA Programming and Performance	3	5894	July 3, 2007

What do you understand by CPU time? CPU time, computational load, cuda prof

Related topics