For CUDA 9.x or later, it doesn’t matter.
With recent (CUDA 9.x, CUDA 10.0) CUDA version, the behavior on the windows operating system is as if it were a pre-pascal regime. In this regime, conccurrent managed access is indeed not possible.
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements
“Applications running on Windows (whether in TCC or WDDM mode) or macOS will use the basic Unified Memory model as on pre-6.x architectures even when they are running on hardware with compute capability 6.x or higher.”
The behavior is expected. cudaMemPrefetchAsync also has no meaning in such a scenario and will return an error code.
I don’t know what this statement is referring to, so my comments don’t apply to that:
The general idea expressed here was already indicated to OP here:
https://devtalk.nvidia.com/default/topic/1029706/cuda-programming-and-performance/partial-fail-of-peer-access-in-8-volta-gpu-instance-p3-16xlarge-on-aws-gt-huge-slowdown-/post/5238143/#5238143
“This is particularly true in a windows regime under CUDA 9.0/9.1, where demand-paged managed memory is not available.”
That statement is still true, and will likely never change for CUDA 9.0, 9.1, 9.2, and 10.0, if history is any guide.
memory hints, memory prefetching, demand-paging, concurrent access are all examples of features related to demand-paging UM which are not available in the “pre-pascal” regime, i.e. when the documentation specifically calls out “the basic Unified Memory model as on pre-6.x architectures”