I have a scenario where invoking a kernel invalidates memory blocks until a sync (cudaStresamSynchronize) is called.
The memory blocks are allocated up front, using cudaMallocManaged, during a data population phase. After that they are never written to.
During a later calculation phase, an array is populated with pointers into these blocks. They array of pointers is copied to gpu memory (allocated with cudaMalloc), and passed to a kernel.
Watching one of these pointers while through the debugger, it looks like
(short*) 0x0000302202 {94}
(i.e., valid pointer to shorts, with first value of 94.)
As soon as the kernel is invoked, the debugger display changes to
(short*) 0x0000302202 {???}
(i.e. invalid pointer.)
De-referencing the pointer now results in an access violation.
Calling cudaStreamSynchronize restores the memory, and the pointer becomes dereferencable again.
This happens even with an empty kernel.
(It also happens in both debug in optimized builds, with the VS debugger attached, with the CUDA debugger attached, or with no debugger attached.)
Why would invoking a kernel invalidate a unified memory block? Is Unified Memory actually intended to behave this way?
(Environment: Windows 10, CUDA 9.1, Visual Studio 2015, GeForce 1080 Ti)