Hi all,
my question goes into the same direction as this one, but this was posted in November. And now there is CUDA 1.1
I understand that 1.1 makes it possible to overlap asynchronous MemCpy and kernel execution by using multiple streams. But all the examples I have found so far are operating on strictly separate segments of device memory.
I wonder what happens if you write device memory via MemCpyAsync while a running kernel is reading it, and vice versa? Like in the case of GPU<->CPU polling? Or updating the enviromental conditions of an evolutionary simulation?
I was not able to find the answer in the CUDA 1.1 programming guide or other posts. So any help would be highly appreciated :)
Thanks in advance,
flori