Kernel Launch Time (CPU Time) Reported in Visual Profiler how to optimize kernel launch

yongcao · January 21, 2011, 9:14pm

Hi,

I have posted this question in another Forum and did not get any reply. I hope I am able to get some help from this forum.

My question is about kernel launch time. It has been haunting me for a long time while I try to optimize CUDA code to get the best performance.

When I profile my code, NVIDIA viusal profiler is a very useful tool. However, the discussion/help on “GPU and CPU time” reported in the profiler is very brief. Often, you get a very small GPU time and very large CPU time for a kernel call. In the profiler help, it says that the CPU time is associated with kernel launch time. How to optimize the code to minimize this CPU time?

Given an example, let’s use visual profiler on “SimpleGL” example application in CUDA SDK. You will find the CPU time is ranging from 600 to 1200 for each kernel call (on my GTX 280 with CUDA 3.2, Win7). The GPU time is always 27.8. So the GPU execution is ready fast, but CPU time for kernel launch is very large.

Some people may think it is because Visual Profiler uses a block mode to launch the kernel. Actually when I use CUDA timer in my code to record the performance (using non-blocking mode), I got the similar results. (I can provide my code if you like to see it.)

Another confusing finding is that when I call three kernels during a loop, it’s already the first kernel has a significant CPU time. The other two are doing fine.

So, my questions are:

What is really happening during this CPU time period?
How to optimize CUDA code to minimize this CPU time?

Hope I can get some constructive suggestions. Thank you!

dsl · July 7, 2011, 7:20am

Maybe this post is helpfull to you. As far as I understood there is an overhead of 3.5-4 ms (real time) for each kernel launch. It is the same for CUDA and for OpenCL since OpenCL uses the CUDA-API for communication with the device.

If your kernel execution time (GPU time) is long enough and the number of kernel calls is low you won’t notice this gap.

Topic		Replies	Views
Kernel Launch Time (CPU Time) Reported in Visual Profiler how to optimize kernel launch CUDA Programming and Performance	0	3772	January 13, 2011
kernel launch time way too long CUDA Programming and Performance	6	4123	July 5, 2011
Visual Profiler (CPU-TIME) CUDA Programming and Performance	3	1625	November 4, 2010
cpu and gpu time in cuda profiler CUDA Programming and Performance	0	1009	December 4, 2010
overhead between two successive kernel calls CUDA Programming and Performance	6	1856	July 7, 2013
cput time in cuda visual profiler CUDA Programming and Performance	0	1027	July 18, 2009
Losing 800us to PCIe latency per Kernel launch Looking for tweaks and optimizations to minimize PCIe CUDA Programming and Performance	1	13951	March 23, 2011
Profiler timings vs. real world timings. VERY different... CUDA Programming and Performance	8	2576	May 15, 2009
Time of cudaLaunch increase with the times of calling kernels. CUDA Programming and Performance	7	1267	September 12, 2017
CUDA Profiler CUDA Programming and Performance	7	12830	October 18, 2010

Kernel Launch Time (CPU Time) Reported in Visual Profiler how to optimize kernel launch

Related topics