Is it possible to perform atomic operations on host memory that has been mapped in the device memory using cuMemHostAlloc and cuMemHostGetDevicePointer?
Yes, but they are only atomic with regards to the device that is performing the atomic operations.
Then, using atomic operations in a host thread to create a shared semaphore would not work, right?
Correct–even if you could, I’m not sure how it would perform well enough to be worthwhile.
Intel CPUs support cache snooping… So, your CPU thread could do continuous polling to figure out if GPU change a mem location…
No idea about semaphores.
I am not concerned about the performance of the implementation, but about the implementability of the mechanism. Just to be sure, would it work?
Thanks for the tip. But the problem I’m facing is how to atomically update a single physical memory location from two different devices (CPU/GPU).