I want to implement a spin lock for both CPUs and multiple GPUs to work cooperatively. Is it feasible to implement it by
__sync_fetch_and_add on pinned host memory?
I noticed that the CUDA C Programming Guide says in 184.108.40.206 that
Note that atomic functions (see Atomic Functions) operating on mapped page-locked memory are not atomic from the point of view of the host or other devices.
Does this means that I should pass only
cudaHostAllocPortable but not
cudaHostAllocPortable | cudaHostAllocMapped or
cudaHostAlloc my spin lock variable?
I post this question because I’ve already implemented a naive one but it only works for a single GPU or on the CPU side. When multiple GPUs or both CPUs and GPUs are involved, deadlock occurred. I’m doubt if this idea works conceptly, or the bug is caused only by issues of my implementation details like incorrect memory fencing.