About the memory sync between host and device

I have the following two question:

  1. Is memory writing and reading atomic?

For example, the kernel is writing to a host-mapped integer while the host checking that integer. Is it possible that the host gets a value with only the first byte of the integer written?

  1. Can __threadfence() guarantee that the memory operation made by the calling thread is visible for the host?

If I launch the kernel in one stream and copy device memory to host memory using cudaMemcpy concurrently on the host side in another stream, after the kernel thread call __threadfence(), is the host in the other stream visible to the memory changes?