I want to use cudaMallocAsync
during graph capture. According to the documentation:
During stream capture, this function results in the creation of an allocation node. In this case, the allocation is owned by the graph instead of the memory pool. The memory pool’s properties are used to set the node’s creation parameters.
It seems cudaMallocAsync
will allocate memory from an internal graph memory pool. Then how can I share it with other processes?
My cudagraph uses some ipc mechanism, currently I use cudaMalloc
, and then export the memory using cudaIpcGetMemHandle
. I want to switch to cudaMallocAsync
, but I don’t find any solution to ipc sharing during graph capture.
Outside graph capture, I know I can share memory pool using cudaMemPoolExportPointer
stuff.