yes, once kernel is invoked, all gpu arrays is owned by the kernel, and you can’t access them from cpu until you are synchronized to the end of kernel execution with cudaStreamSynchronize or so. i think it’s described in the CUDA manual
This sounds like expected behavior. UM under CUDA 9.1 on windows behaves in the “legacy” UM fashion.
A kernel launch will trigger transfer of data from host to device which will invalidate any usage of that pointer in host code until a cuda device synchronize is called. This is all spelled out in the UM section of the programming guide.
Any attempt to use the UM-allocated pointer after a kernel launch, but before a synchronize is done, will result in a seg fault.
Ok. (Any chance you can point me at the relevant section of the manual?)
Will queuing another kernel before calling cudaStreamSync also result in a seg fault? Or is it only host-side access that results in a fault? (The application I’m working on needs to use the ‘read-only’ memory from multiple threads, and each thread needs to launch multiple kernels. It sounds like it’s difficult/impossible to use unified memory in this scenario.)
Thanks for the responses. We are shifting development for this to Linux for the time being, and will add Windows support when these features become available. Hopefully that is sooner rather than later.