I have the following two question:
- Is memory writing and reading atomic?
For example, the kernel is writing to a host-mapped integer while the host checking that integer. Is it possible that the host gets a value with only the first byte of the integer written?
- Can __threadfence() guarantee that the memory operation made by the calling thread is visible for the host?
If I launch the kernel in one stream and copy device memory to host memory using cudaMemcpy concurrently on the host side in another stream, after the kernel thread call __threadfence(), is the host in the other stream visible to the memory changes?