cuStreamBeginCapture overhead vs. specialized nodes?

cory.bloyd · September 25, 2020, 6:55pm

Before I go writing a bunch of code for a test, can anyone tell me right off if this is “clearly a bad idea” or “unclear enough that it must be measured, not predicted”.

It is possible to hand-roll equivalents to CUDA graph Host/Kernel/Memcpy/Memset nodes using captured streams. But, would a large graph composed of a large number of child graphs from captured streams have a significant overhead compared to an equivalent graph composed of specialized nodes?

andy.nicholas · October 14, 2020, 7:29pm

Bump

Topic		Replies	Views
Questions about CUDA graph CUDA Programming and Performance cuda	1	498	October 2, 2020
Getting Started with CUDA Graphs Technical Blog	11	2688	January 8, 2024
Increasing memory footprint with large task graphs CUDA Programming and Performance cuda	3	671	March 6, 2023
Advantage of Cuda Graphs? CUDA Programming and Performance	3	1296	June 28, 2023
Capturing a graph launch CUDA Programming and Performance	0	396	July 11, 2023
Why cudaGraphLaunch(graph_exec_, stream1) dont run the graph at stream1 CUDA Programming and Performance cuda , graphics	1	102	June 6, 2025
CUDA Graph multi-GPU performance CUDA Programming and Performance cuda , performance	1	1133	August 23, 2023
streams: are they worth the time? your opinions/experience appreciated CUDA Programming and Performance	6	5034	April 21, 2009
Multistream in cudagraph capturing CUDA Programming and Performance	1	502	February 6, 2025
CUDA Graph capture - work on separated streams invalidates graph capture CUDA Programming and Performance	5	733	May 1, 2025

cuStreamBeginCapture overhead vs. specialized nodes?

Related topics