Kernel operation delays when gpu is idle

liuyis · March 20, 2024, 4:27am

The gap between CPU kernel launch and GPU kernel execution is called kernel launch latency, in your screenshot it’s about 175us (17. 65ms - 17.475ms), which is not super bad but does look higher than optimal.

I do see many posts talking about launch latency as well instead of just launch cost/overhead, e.g. in this post someone suggested if there are a lot of kernel parameters and/or UVM usage that causes page faults, there could be higher launch latency. Is any of that applicable to your application?

I’m also seeing you are using NCCL in the application, is this a multi-GPU system? Any chance the process is waiting for data from other GPUs before actually scheduling the workload to be run on GPU?

BTW, you may also want to raise a question in the CUDA forum: CUDA Programming and Performance - NVIDIA Developer Forums. While our team develops the profiling tool to allow users observing this kind of performance issues, we don’t always hold the best expertise to explain/resolve them. For this specific issue about high CUDA kernel launch latency, the CUDA team might be able to provide more insight. I can see someone else posted similar questions there in the past, e.g. Too much time for kernel launch latency.

Topic		Replies	Views
Trying to reduce delays between kernel launches CUDA Programming and Performance	0	6658	January 4, 2011
Reducing GPU Idle Time CUDA Programming and Performance	19	4544	June 14, 2022
"idle time" between kernel calls ( from NVVP inspection) CUDA Programming and Performance	4	5212	December 10, 2012
Losing 800us to PCIe latency per Kernel launch Looking for tweaks and optimizations to minimize PCIe CUDA Programming and Performance	1	13919	March 23, 2011
High idle times between kernel exeuction CUDA Programming and Performance	0	2159	April 18, 2012
What are possible reasons of heavy kernel launch latency? CUDA Programming and Performance cuda , kernel , python	12	1126	April 15, 2025
Slow loading kernel to GPU CUDA Programming and Performance	11	12968	April 18, 2008
overhead between two successive kernel calls CUDA Programming and Performance	6	1778	July 7, 2013
Why Cuda Kernel Launch Takes so much time ？ CUDA Programming and Performance cuda , gstreamer	1	882	November 9, 2023
Gap between some thread calls CUDA Programming and Performance	6	1298	October 30, 2014

Kernel operation delays when gpu is idle

Related topics