I have two questions related to the CUDA graph feature:
As for building a CUDA graph, what is the hidden overhead of using the stream capture (comparing to Graph APIs, i.e., add nodes/dependencies manually)?
Does CUDA graph only support cuStreamWaitEvent()? How about the cuStreamWaitValue32()? Any plan on supporting it?