We would like to access the same memory area in the GPGPU from two Pods (Pod-A and Pod-B) running on k8s.
We are trying, but can’t get it to work.
We have tried the following procedure, but an error occurred.
In this experiment, we would like to achieve GPGPU memory sharing between Pod-A and Pod-B using the CUDA Driver API.
Specifically, an error is occurring in cudaIpcGetEventHandle() on PodA, which is operated by the following procedure.
What is the correct procedure for operating GPGPU memory sharing when using the CUDA Driver API?
Note that when we attempted to share GPGPU memory between Pod-A and Pod-B using the CUDA Runtime API by referring to the CUDA Runtime API :: CUDA Toolkit Documentation, it was successfully done. We would like to perform a similar operation using the CUDA Driver API.
The above issue occurs not only when using k8s, but also when running on HostOS.
We attach a sample code where the issue occurs.
The following is a description of the sample code.
Process A (test_cuda_ipcget2.cu)
(1) Start cuda API driver process
cuDriverGetVersion()
cuCtxGetDevice()
cuDeviceGetAttribute()
cuMemGetAllocationGranularity()
cuMemAddressReserve()
cuMemCreate()
cuMemMap()
cuMemSetAccess()
cudaIpcGetMemHandle() ★Error with invalid argument
Process A (test_cuda_ipcget2.cu)
(2) cuda runtime API process startup
cudaMalloc()
cudaMemcpy()
cudaIpcGetMemHandle()
Hold IPC handle in shared memory
Process B (test_cuda_ipcopen2.cu)
(2) Start cuda runtime API process
Obtain IPC handle from shared memory
cudaIpcOpenMemHandle ()
cudaMemcpy()
Outputs contents of memory written by process A
Shared memory closed
Sample Code
■Process A ①: test_cuda_ipcget1.cu
■Process A ②: test_cuda_ipcget2.cu