Hi,
My team is currently working on organizing a complex algorithm using CUDA Graphs. One step of the algorithm involves a call to a third-party library, which includes a CUDA kernel launch along with some host-side pre- and post-processing.
Because of this combination of host and device operations (and lack of direct control over the kernel launch), it doesn’t seem possible to represent this step using either a HostNode or KernelNode in the CUDA Graph.
Is there a way to embed such a function into a CUDA Graph, or work around this limitation?
Thanks