Advantage of Cuda Graphs?

boxerab · June 28, 2023, 9:44pm

In my encoder, for each image, I have copy from host to device on stream 0, then a series of N kernels on stream 1, then copy from device to host on stream 2. The streams are synchronized via events, so that stream 1 kernels only execute when host to device copy is complete. I schedule 40 encodes at a time, and when I get the callback after the 20th encode, I schedule another 40 (in another thread). So, my work flow seems to meet the use case for Graphs. What can I gain by using a Graph to capture the N kernels on stream 1 ?

njuffa · June 28, 2023, 10:15pm

Why not just try CUDA Graphs and find out whether it benefits your use case?

If many of the kernels have very short runtime, the use of CUDA Graphs can significantly reduce overall launch overhead, resulting in higher performance.

boxerab · June 28, 2023, 10:39pm

Thanks, kernel runtime is of order of milliseconds. Is that considered short ?

njuffa · June 28, 2023, 10:52pm

Kernel launch overhead is 3 to 5 microsecond on modern high-end systems. So if the kernel runtime is in the millisecond range (so about 1000x difference), kernel launch overhead is pretty much irrelevant. Actually, kernel run times in the single digit milliseconds on high-end systems are about the sweet spot for CUDA accelerated apps, in particular with regard to the user interface. At any time, the performance difference between lowest-end and highest-end GPUs is typically around 20x, so such software tends to be reasonably responsive even on low-end hardware.

The other aspect of using CUDA Graphs is the convenience aspect (ciuld be summarized as “capture and replay”). Since your software appears to be already complete and tuned this does not look like a compelling argument at this stage, but you may disagree.

As I said, one approach is to just give it a try: run some experiments and see how you like it. You may discover advantageous aspects of using CUDA Graphs that a mere thought experiment is not going to uncover.

Topic		Replies	Views
CUDA Graphs Impact CUDA Programming and Performance	2	508	September 17, 2021
Constant Time Launch for Straight-Line Graphs and Other Performance Enhancements Technical Blog	2	15	March 20, 2025
CUDA Graph multi-GPU performance CUDA Programming and Performance cuda , performance	1	976	August 23, 2023
Employing CUDA Graphs in a Dynamic Environment Technical Blog	3	796	February 8, 2022
The Overhead of Streams and Events CUDA Programming and Performance	1	2068	July 24, 2018
Constructing CUDA Graphs with Dynamic Parameters Technical Blog	1	420	August 23, 2022
Getting Started with CUDA Graphs Technical Blog	11	2133	January 8, 2024
How lightweight are cudaStream_t's? CUDA Programming and Performance	6	1138	September 26, 2018
Benefits (or lack thereof) of using CUDA streams for kernel concurrency CUDA Programming and Performance	5	960	March 17, 2021
Multi-stream graph CUDA Programming and Performance	3	174	February 5, 2025

Advantage of Cuda Graphs?

Related topics