CUDA Graphs and shared memory

Good Afternoon,

I am new with CUDA Graphs using multiple streams and am having some issues. In particular I am trying to use device graphs with a CUDA kernel that is using dynamically allocated shared memory, i.e., the size of the shared memory used by the kernel is passed to the kernel during invocation.

The problem I am encountering is that the system seems to hang at the point where the cudaGraphLaunch is called. No errors during compilation or execution, the system just hangs.

Is there any special cases I need to know when CUDA Graphs and kernels that are launched with dynamically defined shared memory ? I am trying to nail down if this is a synchronization issue with the multiple streams or an underlying problem with using dynamic shared memory and CUDA Graphs.

Thank you in advance for assistance.

Do you have a simple repro? The hang seems unexpected.

Thank you for the response @hhoffman

I found the problem though. It had nothing to do with shared memory apparently using warp-level primitives, e.g., __shfl_down_sync caused the system to hang.

1 Like

improper use of a sync primitive can certainly cause a hang. Otherwise I don’t think that proper use of a sync primitive should cause a hang, with or without CUDA Graphs.

2 Likes