Hi,
I want to implement a spin lock for both CPUs and multiple GPUs to work cooperatively. Is it feasible to implement it by atomicCAS_system
and __sync_fetch_and_add
on pinned host memory?
I noticed that the CUDA C Programming Guide says in 3.2.5.3 that
Note that atomic functions (see Atomic Functions) operating on mapped page-locked memory are not atomic from the point of view of the host or other devices.
Does this means that I should pass only cudaHostAllocPortable
but not cudaHostAllocPortable | cudaHostAllocMapped
or cudaHostAllocDefault
to cudaHostAlloc
my spin lock variable?
I post this question because I’ve already implemented a naive one but it only works for a single GPU or on the CPU side. When multiple GPUs or both CPUs and GPUs are involved, deadlock occurred. I’m doubt if this idea works conceptly, or the bug is caused only by issues of my implementation details like incorrect memory fencing.
Thanks