So I want to perform some synchronization:
- between a CUDA kernel and CPU
- between multiple GPUs (each attached to a different process) in CUDA kernel
To do 1, is it legal to use some flags allocated with CUDA unified memory as a spin lock? For example :
a. GPU does some work, then sets a flag1
b. CPU spins on flag1 until it’s set, and sets a flag2
c. GPU spins on flag2 until it’s set and do some other work
To do 2, is it legal to use the CUDA IPC memory created via cudaIpcGetMemHandle and opened via cudaIpcOpenEventHandle after the handle is communicated to the local peer (e.g. with MPI), in a similar scenario as above, but instead of GPU-CPU synchronization, we do synchronization between two GPUs in a CUDA kernel.
In both cases, is it guaranteed that all devices will always see the latest value of the flags? Does a volatile keyword make any difference here?