How to access GPGPU memory area from two pods

We would like to access the same memory area in the GPGPU from two Pods (Pod-A and Pod-B) running on k8s.
We are trying, but can’t get it to work.
We have tried the following procedure, but an error occurred.

Please let us know about the correct procedure.

Processing Pod-A

cudaSetDevice()
galloc_fn ()
 cuDriverGetVersion()
 cuCtxGetDevice() 
 cuDeviceGetAttribute()
 cuMemGetAllocationGranularity()
 cuMemAddressReserve()  
 cuMemCreate()              
 cuMemMap()               
 cuMemSetAccess()
gdr_open_safe()
gdr_pin_buffer()
gdr_map()
gdr_get_info()
cudaMemset()
cudaStreamCreate()
cudaIpcGetEventHandle() error occurs
“cudaIpcGetMemHandle failed: invalid argument”.

The test environment is as follows

based on the GDRCopy Project v2.3.1
CentOS8.2(5.10.57)
nvidia/cuda:12.2.2-base-ubi8 as base  

In this experiment, we would like to achieve GPGPU memory sharing between Pod-A and Pod-B using the CUDA Driver API.
Specifically, an error is occurring in cudaIpcGetEventHandle() on PodA, which is operated by the following procedure.
What is the correct procedure for operating GPGPU memory sharing when using the CUDA Driver API?

Pod-A operation
cuDriverGetVersion()
cuCtxGetDevice()
cuDeviceGetAttribute()
cuMemGetAllocationGranularity()
cuMemAddressReserve()
cuMemCreate()
cuMemMap()
cuMemSetAccess()
cudaIpcGetEventHandle()
“cudaIpcGetMemHandle failed: invalid argument” occurs

Note that when we attempted to share GPGPU memory between Pod-A and Pod-B using the CUDA Runtime API by referring to the CUDA Runtime API :: CUDA Toolkit Documentation, it was successfully done. We would like to perform a similar operation using the CUDA Driver API.

https://nw.tsuda.ac.jp/lec/cuda/doc_v9_0/html/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g01050a29fefde385b1042081ada4cde9

The above issue occurs not only when using k8s, but also when running on HostOS.
We attach a sample code where the issue occurs.

The following is a description of the sample code.

Process A (test_cuda_ipcget2.cu)
(1) Start cuda API driver process
cuDriverGetVersion()
cuCtxGetDevice()
cuDeviceGetAttribute()
cuMemGetAllocationGranularity()
cuMemAddressReserve()
cuMemCreate()
cuMemMap()
cuMemSetAccess()
cudaIpcGetMemHandle() ★Error with invalid argument

Process A (test_cuda_ipcget2.cu)

(2) cuda runtime API process startup
cudaMalloc()
cudaMemcpy()
cudaIpcGetMemHandle()
Hold IPC handle in shared memory

Process B (test_cuda_ipcopen2.cu)

(2) Start cuda runtime API process
Obtain IPC handle from shared memory
cudaIpcOpenMemHandle ()
cudaMemcpy()
Outputs contents of memory written by process A
Shared memory closed

Sample Code
■Process A ①: test_cuda_ipcget1.cu
■Process A ②: test_cuda_ipcget2.cu

■Process B ②:
test_cuda_ipcopen2.cu

test_cuda_ipcget2.txt (2.0 KB)
test_cuda_ipcopen2.txt (1.3 KB)
test_cuda_ipcget1.txt (3.8 KB)

This CUDA sample code shows how to use the driver API to set up an IPC buffer when using the low-level VMM APIs. Note the use of cuMemExportToShareableHandle rather than cudaIpcGetMemHandle.

Thank you for your response.
We used cumExportToShareableHandle and were able to access it as expected.
We will close this issue.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.