Is it able to implement a spin lock across host and multiple devices using pinned host memory?


I want to implement a spin lock for both CPUs and multiple GPUs to work cooperatively. Is it feasible to implement it by atomicCAS_system and __sync_fetch_and_add on pinned host memory?

I noticed that the CUDA C Programming Guide says in that

Note that atomic functions (see Atomic Functions) operating on mapped page-locked memory are not atomic from the point of view of the host or other devices.

Does this means that I should pass only cudaHostAllocPortable but not cudaHostAllocPortable | cudaHostAllocMapped or cudaHostAllocDefault to cudaHostAlloc my spin lock variable?

I post this question because I’ve already implemented a naive one but it only works for a single GPU or on the CPU side. When multiple GPUs or both CPUs and GPUs are involved, deadlock occurred. I’m doubt if this idea works conceptly, or the bug is caused only by issues of my implementation details like incorrect memory fencing.


I interpret it to mean it is simply not reliable. Any pinned memory region created by cudaHostAlloc is mapped by default in UVA (i.e. 64-bit) setting.

Your testing seems to confirm this.

You can create a reliable atomicCAS_system operation amongst all processors in the system using managed memory.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.