Memory Consistency in the CUDA Zero Copy Memory

I wonder if there is any document talking about the memory consistency model when using the zero copy memory. I guess it might be unusual in practice that the CPU thread and GPU thread both read/write that region of memory.

I have tried some litmus-test-like programs on Tesla K20m with Xeon E5-2670. For a single CPU thread running along with a single GPU thread, it seems that none of the relaxed memory consistency behavior has been observed. (It behaves exactly the same as SC. I was expecting it should behave at least the same or weaker than TSO since CPU is involved. How wierd it is!)

My experiment methods may not be perfect. In order to verify this, I’d like to know if there is any official documents on this.

Thanks!