I was going through CUDA Graph documentation and tutorials. I came to know that CUDA Graphs, whether you use cudaStreamCapture
or Graph API, the graph structure cannot have synchronous API.
It is therefore also invalid to call synchronous APIs in this case. Synchronous APIs, such as cudaMemcpy(), enqueue work to the legacy stream and synchronize it before returning.
So, we can use the async
equivalent of the sync
API calls (cudaMemcpyAsync
for example). But as the CUDA Graph 101 talk mentions, there are certain APIs without an async
equivalent— the cudaMallocHost
is an example. The talk suggests getting around this issue by hoisting the cudaMallocHost
above the graph capture. (So that our CUDA Graph is free of any sync operation).
What are the other APIs which do not have an async
equivalent?