How to share the memory allocated by cudamallocasync during graph capture?

I want to use cudaMallocAsync during graph capture. According to the documentation:

During stream capture, this function results in the creation of an allocation node. In this case, the allocation is owned by the graph instead of the memory pool. The memory pool’s properties are used to set the node’s creation parameters.

It seems cudaMallocAsync will allocate memory from an internal graph memory pool. Then how can I share it with other processes?

My cudagraph uses some ipc mechanism, currently I use cudaMalloc, and then export the memory using cudaIpcGetMemHandle . I want to switch to cudaMallocAsync , but I don’t find any solution to ipc sharing during graph capture.

Outside graph capture, I know I can share memory pool using cudaMemPoolExportPointer stuff.

Did you try it with graph capture? What was the result? Do you get an error during capture? Can you show your CUDA C++ test case?

No, I didn’t try it yet. I want to ask for the general rule first, before I invest into this effort. Otherwise I’m afraid that I get undefined behavior. Even if it works in one node, I don’t know if it will work the same in another machine.