Share Cuda memory between different system processes

kuskov.stanislav · October 12, 2021, 9:55pm

Hi!
I have some PyTorch tensors in a single script and want to share them with each other scripts.
How I can do that?

I have tensors in Cuda memory ( in a backbone.py) and I think the best solution was to get something like an address in Cuda memory that I could turn to to get (in a head.py) these tensors.

Any ideas?
PS. Looks like task for triton inference server

AKravets · October 13, 2021, 5:59am

Hi @kuskov.stanislav
This question might be better suited for CUDA Programming and Performance - NVIDIA Developer Forums forum branch. I have moved it there.

Robert_Crovella · October 13, 2021, 2:20pm

CUDA IPC mechanism allows for sharing of device memory between processes. There are CUDA sample codes that demonstrate it. I won’t be able to give you a roadmap for whatever you are trying to do in pytorch. However a simple google search of “pytorch cuda ipc” turned up articles like this which may be of interest.

kuskov.stanislav · October 13, 2021, 10:01pm

Thank You for this link! It’s looks same of my task. I will try and write about the results.

kuskov.stanislav · November 3, 2021, 1:44pm

@Robert_Crovella Hi Robert! I read the article and it tells how to create a handle, but does not tell you how to read it. I also tried to implement transfer using numba, but it gave me a strange result. when I read the headline with numba, I get an incorrect result from the network from time to time. I added a delay after converting tensor and it started working correctly.

Sample of my code

backbone_features = torch.cat(backbone_features_list, dim=0)
desc = backbone_features.__cuda_array_interface__
shape = desc["shape"]
strides = desc.get("strides")
dtype = np.dtype(desc["typestr"])
shape, strides, dtype = _prepare_shape_strides_dtype(shape, strides, dtype, order="C")
size = cuda.driver.memory_size_from_info(shape, strides, dtype.itemsize)
devptr = cuda.driver.get_devptr_for_active_ctx(
            backbone_features.__cuda_array_interface__["data"][0]
        )
data = cuda.driver.MemoryPointer(
            current_context(), devptr, size=size, owner=backbone_features
        )
ipch = devices.get_context().get_ipc_handle(data)
desc = dict(shape=shape, strides=strides, dtype=dtype)
handle = pickle.dumps([ipch, desc])

and send this handle to another process

Robert_Crovella · November 3, 2021, 1:48pm

You might want to study a CUDA IPC sample code. Sorry, I won’t be able to debug your torch/python/CUDA/IPC code for you.

kuskov.stanislav · November 3, 2021, 1:54pm

@Robert_Crovella . I create another topic about it. This code is just of example with numba library, maybe on of numba developers provide me

Topic		Replies	Views
Sharing CUDA memory by numba CUDA Programming and Performance	0	687	November 3, 2021
sharing GPU arrays between processes ( Python ) CUDA Programming and Performance	0	514	February 24, 2019
Share GPU memory between different processes Is it possibile? CUDA Programming and Performance	3	3617	February 10, 2012
Share GPU/host pinned memory between host processes CUDA Programming and Performance	5	4078	March 7, 2012
CUDA 4.1 RC1: "Peer-to-peer communication between processes"? CUDA Programming and Performance	4	8014	November 9, 2011
How can I share TensorGPU object in multiple processes in python TensorRT tensorrt , cuda , ubuntu , gpu	1	532	December 24, 2021
How to share the same Device Memory between 2 process CUDA Programming and Performance	12	7553	October 28, 2009
P2P mem transfer between multiple CPU processes CUDA Programming and Performance	2	3449	February 20, 2012
PyCUDA pass pointers to GPU memory CUDA Programming and Performance pycuda	5	1996	December 31, 2020
Inter-process data address transfer CUDA Programming and Performance	4	11156	August 25, 2011

Share Cuda memory between different system processes

Related topics