A list of APIs which does not have an "async" equivalent (in the context of CUDA Graphs)?

I was going through CUDA Graph documentation and tutorials. I came to know that CUDA Graphs, whether you use cudaStreamCapture or Graph API, the graph structure cannot have synchronous API.

It is therefore also invalid to call synchronous APIs in this case. Synchronous APIs, such as cudaMemcpy(), enqueue work to the legacy stream and synchronize it before returning.

So, we can use the async equivalent of the sync API calls (cudaMemcpyAsync for example). But as the CUDA Graph 101 talk mentions, there are certain APIs without an async equivalent— the cudaMallocHost is an example. The talk suggests getting around this issue by hoisting the cudaMallocHost above the graph capture. (So that our CUDA Graph is free of any sync operation).

What are the other APIs which do not have an async equivalent?

@Robert_Crovella Any resource suggestions? It shall be of great help.

With few exceptions, every operation for which no graph node type exists cannot be captured. You can look them up in the runtime API.
There are *async operations which cannot be captured (the programming guide mentions cudaStreamAttachMemAsync), and non-async operations which can be captured (event record + stream wait event for node dependencies).

1 Like

@striker159

With few exceptions

What kind of exceptions are you talking about here (regarding inspecting graph node types to find out which operations are capturable and which are not)? Can you please give an example?

cudaEventRecord(event, stream)
cudaStreamWaitEvent(stream2, event, 0)

This can be captured but will not create a node in the graph

1 Like