Hi, I’d like to know some GPU global memory features.
Say I have 2 GPUs, GPU 0 and GPU 1.
On GPU 1, I create the IPC
handle to a buffer:
size_t data_size = 1; // in this case, just an integer
cudaIpcMemHandle_t ipc_handle;
int* buffer;
CUDACHECK(cudaMalloc(&buffer, data_size * sizeof(int)));
CUDACHECK(cudaMemset(buffer, 0, data_size * sizeof(int)));
CUDACHECK(cudaIpcGetMemHandle(&ipc_handle, buffer));
I also send ipc_handle
to GPU 0 so that both GPU 0 and GPU 1 can access buffer
on GPU 1.
At some point, GPU 0 tries to read buffer
and check if it is 0. If it is 0, GPU 0 will increment it by 1 and write it back to buffer
. The problem is, if GPU 0 is modifying buffer
, how to tell GPU 1 so that it can block from accessing buffer
? In my case, both GPUs will use one thread in each block to access the buffer
and do some checks and incrementation, and there might be many blocks.
I am wondering if there are any functions provided in CUDA to achieve this kind of operation.
Thanks !!
The IPC mechanism has the ability to publish events also. You can use events to provide suitable barriers to make the “communication” work.
Hi @Robert_Crovella , could you provide some naive examples of this? I am not quite clear on how to use events in a kernel function. Or can you point me to some related documents?
you don’t use events in a kernel function.
the cuda simpleIPC sample code shows basic syntax for acquiring an event handle from one process to another.
The basic idea is that you can use events as a marker in stream processing. In process A, you would launch a kernel followed by an event into a particular stream. The kernel would update the value in the IPC-shared memory. In process B, you wait on the event (e.g. cudaStreamWaitEvent()). You would then launch a kernel in process B, and at that point you could expect that process B would see the updates in the IPC-shared memory that were made by the kernel in process A.
General information about streams and events is covered in the asynchronous concurrent execution section of the CUDA programming guide.
IPC is also briefly covered in the cuda programming guide
One of the challenges with waiting on an event is that there is a requirement that an event be recorded/issued before the corresponding call to cudaStreamWaitEvent()
is issued. This generally requires some additional synchronization, which could be handled in this case using purely host-based IPC, and is not CUDA specific.