Synchronization Guarantees of Unified Memory and cudaIpcOpenMemHandle

So I want to perform some synchronization:

  1. between a CUDA kernel and CPU
  2. between multiple GPUs (each attached to a different process) in CUDA kernel

To do 1, is it legal to use some flags allocated with CUDA unified memory as a spin lock? For example :
a. GPU does some work, then sets a flag1
b. CPU spins on flag1 until it’s set, and sets a flag2
c. GPU spins on flag2 until it’s set and do some other work

To do 2, is it legal to use the CUDA IPC memory created via cudaIpcGetMemHandle and opened via cudaIpcOpenEventHandle after the handle is communicated to the local peer (e.g. with MPI), in a similar scenario as above, but instead of GPU-CPU synchronization, we do synchronization between two GPUs in a CUDA kernel.

In both cases, is it guaranteed that all devices will always see the latest value of the flags? Does a volatile keyword make any difference here?

It is legal. The issue will be visibility.

No, not guaranteed. volatile may help. You should also pay attention to the memory fences outlined in the programming guide, including __threadfence_system().

With Unified memory (case 1), you may be able to take advantage of system atomics in unified memory to help with this.

The libcu++ library may have some useful functions, such as semaphores.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.