Questions about CUDA graph


I have two questions related to the CUDA graph feature:

  1. As for building a CUDA graph, what is the hidden overhead of using the stream capture (comparing to Graph APIs, i.e., add nodes/dependencies manually)?

  2. Does CUDA graph only support cuStreamWaitEvent()? How about the cuStreamWaitValue32()? Any plan on supporting it?


I would be very interested in the feature too, to capture cuStreamWaitValue32() into CUDA graphs. Any updates on it?