Questions about CUDA graph

Hi,

I have two questions related to the CUDA graph feature:

  1. As for building a CUDA graph, what is the hidden overhead of using the stream capture (comparing to Graph APIs, i.e., add nodes/dependencies manually)?

  2. Does CUDA graph only support cuStreamWaitEvent()? How about the cuStreamWaitValue32()? Any plan on supporting it?

Thanks,

I would be very interested in the feature too, to capture cuStreamWaitValue32() into CUDA graphs. Any updates on it?