Will Unified Addressing keep cache coherence between CPU and GPU?

I know the Unified Addressing lets a device can directly access buffers in the host memory.
But I don’t know if the cache coherence between CPU and GPU will be kept at running time.
For example, if a kernel monitors a pointer which points a host buffer in a loop while the CPU changes the buffer, will the GPU notice the modification? Can CUDA 6.0 handle the case?

Your proposed case is illegal in the current Unified Memory scheme.
Under unified Virtual Addressing (e.g. zero-copy), it’s possible for the CPU and GPU to see each others updates to a given (mapped, i.e. zero-copy) memory location while a kernel is running. Making this work correctly may require the use of volatile qualifiers as well as memory fence instructions.

Here’s a couple examples:

[url]cuda - Pthreads and CudaMemcpyAsync - Stack Overflow
[url]cuda - How can I check the progress of matrix multiplication? - Stack Overflow

Hi txbob,
Thanks for the links.
I don’t understand why the usage is illegal. Could you provide more information?
Does your 2nd answer comfirm that the conerence will be kept? From the link you supported, the answr seems YES.
Do you know that the coherence is maintain by software or hardware?

The CPU is not allowed to access GPU memory allocated using one of the managed allocators in Unified Memory (currently), while it is being accessed/used by the GPU (i.e. while the kernel is running, or before a cudaDeviceSynchronize()). This probably isn’t what you are referring to, but I mention it because occasionally folks mix up UVM and UM. If you want more details about UM, there is a section in the programming guide:

[url]Programming Guide :: CUDA Toolkit Documentation

I can’t give you details about coherence. I can tell you that using zero-copy memory, it’s possible for both CPU and GPU to access and pass data back and forth, along the lines of the examples I have already linked.