Hi,
My team is currently working on organizing a complex algorithm using CUDA Graphs. One step of the algorithm involves a call to a third party library, which includes a CUDA kernel launch along with some host side pre and post processing.
Because of this combination of host and device operations (and lack of direct control over the kernel launch), it doesn’t seem possible to represent this step using either a HostNode or KernelNode in the CUDA Graph.
Is there a way to embed such a function into a CUDA Graph, or work around this limitation?
Thanks