Accesing memory from both kernel and host side

Hi all,
my question goes into the same direction as this one, but this was posted in November. And now there is CUDA 1.1
I understand that 1.1 makes it possible to overlap asynchronous MemCpy and kernel execution by using multiple streams. But all the examples I have found so far are operating on strictly separate segments of device memory.

I wonder what happens if you write device memory via MemCpyAsync while a running kernel is reading it, and vice versa? Like in the case of GPU<->CPU polling? Or updating the enviromental conditions of an evolutionary simulation?

I was not able to find the answer in the CUDA 1.1 programming guide or other posts. So any help would be highly appreciated :)

Thanks in advance,
flori

If a kernel is reading the same memory that cudaMemcpyAsync is writing to, expect race conditions to cause problems. There are no synchronization mechanisms for this.

Async memory copies are useful in pipeline situations where you can have kernel A working on memory A while copying memory B for the next kernel at the same time.