Scheduling of kernels

foteini · January 2, 2023, 10:31pm

Hi all,

I have a question regarding the scheduling of kernels.

I have two kernels, A and B.
Kernel A is a small one, occupying only a few Streaming Multiprocessors.
Kernel B is large, requiring many SMs (more than what is contained in the GPU).

I would like to colocate kernels A and B. The two kernels run in different CUDA streams.
By providing higher priority to the stream running kernel A, most of the time the two kernels can colocate, as I see from the NVIDIA Nsight trace.

However, at some points, as shown in the attached screenshot (with red marks), the kernels are not able to colocate.
Would anyone have any hints on why this might be happening, or how I could further investigate it?

Some info about the experiment and the attached screenshot:

The kernels are taken from PyTorch programs, and for test reasons, each stream schedules the same kernel for a specified number of iterations.
In the screenshot the kernel ‘void cudnn…’ corresponds to kernel A (small one, high priority), while the kernel ‘volta_scudnn…’ corresponds to kernel B (large one, low priority)
The GPU used is V100-16GB

Thank you!

Topic		Replies	Views
Blocking scheduler - Question about the priority of scheduling kernel blocks on concurrent streams CUDA Programming and Performance	2	408	September 8, 2023
cuda stream high priority could not always schedule high prority CUDA Programming and Performance	2	799	July 11, 2019
How to verify that high priority stream is served CUDA Programming and Performance	12	2263	April 24, 2025
Limit number of (or allocate) SM on a per stream basis CUDA Programming and Performance	3	1619	November 14, 2023
Questions of CUDA stream priority CUDA Programming and Performance cuda	10	4592	April 19, 2023
Kernels from different non-blocking streams are not executed concurrently CUDA Programming and Performance	1	683	March 26, 2021
Cuda stream priorities Inference Town Hall 7-24-25 cuda	0	46	July 17, 2025
Thread Block Scheduler uses disjoint SMs for 2 kernels in separate streams CUDA Programming and Performance	3	168	February 3, 2025
Processing Order with Cuda Streams in 7.5 CUDA Programming and Performance	13	2179	June 24, 2016
What could cause kernel execution to not overlap on different streams? CUDA Programming and Performance	8	2262	June 1, 2017

Scheduling of kernels

Related topics