Good Afternoon,
I am new with CUDA Graphs
using multiple streams and am having some issues. In particular I am trying to use device graphs with a CUDA kernel that is using dynamically allocated shared memory, i.e., the size of the shared memory used by the kernel is passed to the kernel during invocation.
The problem I am encountering is that the system seems to hang at the point where the cudaGraphLaunch
is called. No errors during compilation or execution, the system just hangs.
Is there any special cases I need to know when CUDA Graphs
and kernels that are launched with dynamically defined shared memory ? I am trying to nail down if this is a synchronization issue with the multiple streams or an underlying problem with using dynamic shared memory and CUDA Graphs
.
Thank you in advance for assistance.