Employing CUDA Graphs in a Dynamic Environment

Originally published at: Employing CUDA Graphs in a Dynamic Environment | NVIDIA Developer Blog

Many workloads can be sped up greatly by offloading compute-intensive parts onto GPUs. In CUDA terms, this is known as launching kernels. When those kernels are many and of short duration, launch overhead sometimes becomes a problem. One way of reducing that overhead is offered by CUDA Graphs. Graphs work because they combine arbitrary numbers…

Hello thank You for sharing very intresting blog post! Is it possible to use CUDA graphs in a while loop - I mean I will execute the kernel multiple times until some condition will be met - Hence I do not know in advance how long should be sequence of kernel lounches in the graph - currently I manage by running loop inside the cooperative kernel and syncgrid(), but it would be more convinient to separate logic into multiple smaller kernels and avoid gridsync .

Thank you for your comment and question. It does not matter whether the kernels that you would like to put in a CUDA graph are executed in a for loop, a while loop, or any other construct, as long as the conditions for CUDA graphs are met. That means that the topology of the resulting graph does not change from one execution of your while loop to the next (this would allow you to use the graph update API), or that the exact same graph is encountered multiple times, so that it can be retrieved from a container in which it was stored upon first encounter.