Reworking Library to launch a graph for Deep Neural Network

lucafaccenda22 · November 18, 2024, 9:33am

Hello,

I am relatively new to Cuda and have spent the last couple of months working on improving a Closed loop deep learning library to optimise performance for a university project. I have identified that the remaining thing hindering the current performance is the time taken for the kernel launches and I am working on trying to solve this issue using graphs.

Basically, my current application deploys a series of kernels for a single iteration of learning following the steps shown below.

Overall, I currently launch 4 kernels for every layer in the network which causes my overhead to be far too high.

I would like to use stream capture if possible but I need to somehow capture that the inputs to each kernel are changing after each subsequent kernel launch.

I have a few key questions:

do I need to create a graph for every iteration of learning or is there some way I could create the graph to have dynamic kernel inputs and just relaunching it each time ?
Is it likely that the creation of a graph for this purpose will take longer than the launch overhead of 4n kernels per layer

Thanks in advance for any help and sorry if I have worded anything poorly

full library code: CLDL-CUDA/lib at main · L-A-F-987/CLDL-CUDA · GitHub

setInputs_layer_0
//
//
for (int i=0;i<nLayers-1; i++) {
// Calculates the output to the given layer using a layer function
layers[i]->calcOutputs();
double* layerOutputs = layers[i]->getOutputs();
// Propagates the new outputs to the Input of the next layer
layers[i+1]->propInputs(layerOutputs);
}
layers[nLayers-1]->calcOutputs();
//
//
set_backward_error(output - input)
//
//
double* sumlist;
for (int i = nLayers - 1; i > 0; i–) {
sumlist = layers[i]->calcErrorWeightProductSum();
layers[i-1]->propErrorBackward(sumlist);
}

rs277 · November 18, 2024, 6:00pm

Perhaps you’ve already seen this, and the other related articles listed down the right hand side.

lucafaccenda22 · November 19, 2024, 10:32am

Thank you !

Topic		Replies	Views
Getting Started with CUDA Graphs Technical Blog	11	2654	January 8, 2024
Employing CUDA Graphs in a Dynamic Environment Technical Blog	3	936	February 8, 2022
Constructing CUDA Graphs with Dynamic Parameters Technical Blog	1	488	August 23, 2022
Why cudaGraphLaunch(graph_exec_, stream1) dont run the graph at stream1 CUDA Programming and Performance cuda , graphics	1	95	June 6, 2025
How to use cuda graph more effienty Jetson AGX Orin cuda	2	341	July 2, 2024
CUDA 10.2 Graphs CUDA Programming and Performance	3	93	November 14, 2024
Multiple independent streams in a graph CUDA Programming and Performance	2	2445	October 7, 2019
Enabling Dynamic Control Flow in CUDA Graphs with Device Graph Launch Technical Blog	3	937	April 12, 2024
Advantage of Cuda Graphs? CUDA Programming and Performance	3	1277	June 28, 2023
Capturing a graph launch CUDA Programming and Performance	0	393	July 11, 2023

Reworking Library to launch a graph for Deep Neural Network

Related topics