How to share the memory allocated by cudamallocasync during graph capture?

youkaichao1 · May 31, 2024, 9:25pm

I want to use cudaMallocAsync during graph capture. According to the documentation:

During stream capture, this function results in the creation of an allocation node. In this case, the allocation is owned by the graph instead of the memory pool. The memory pool’s properties are used to set the node’s creation parameters.

It seems cudaMallocAsync will allocate memory from an internal graph memory pool. Then how can I share it with other processes?

My cudagraph uses some ipc mechanism, currently I use cudaMalloc, and then export the memory using cudaIpcGetMemHandle . I want to switch to cudaMallocAsync , but I don’t find any solution to ipc sharing during graph capture.

Outside graph capture, I know I can share memory pool using cudaMemPoolExportPointer stuff.

Robert_Crovella · June 1, 2024, 2:47am

Did you try it with graph capture? What was the result? Do you get an error during capture? Can you show your CUDA C++ test case?

youkaichao1 · June 1, 2024, 2:54am

No, I didn’t try it yet. I want to ask for the general rule first, before I invest into this effort. Otherwise I’m afraid that I get undefined behavior. Even if it works in one node, I don’t know if it will work the same in another machine.

Topic		Replies	Views
cudaMalloc and sharing between CPU threads CUDA Programming and Performance	0	4342	May 20, 2009
Why do SimpleIPC example use sharedMemoryCreate before cudaMalloc? CUDA Programming and Performance cuda	2	1212	July 19, 2022
sharing GPU arrays between processes ( Python ) CUDA Programming and Performance	0	485	February 24, 2019
cudaMallocManaged do not allocate on shared VRAM, but on dedicated VRAM CUDA Programming and Performance	2	484	October 12, 2021
cudaMalloc and threads "invalid device pointer" error CUDA Programming and Performance	4	5446	June 26, 2007
Access Unified Memory location from two different application CUDA Programming and Performance	4	490	January 2, 2023
sharing device memory between two or more streams CUDA Programming and Performance	1	599	May 21, 2015
Sharing GPU global memory with multiple CPU threads CUDA Programming and Performance	5	2643	February 26, 2019
Memory usage within GPU CUDA Programming and Performance	2	2351	July 13, 2009
How to access the memory that is allocated using cudaMallocHost from cpu? CUDA Programming and Performance cuda , kernel	1	635	September 4, 2022

How to share the memory allocated by cudamallocasync during graph capture?

Related topics