I found that after prefetching a block of memory allocated by cudaMallocManaged with cudaMemPrefetch, memory usage of device will increase but memory usage of process just keep the same. And I can reproduce it with the test code below, Is it designed to be like this?
test.cu (5.6 KB)
Debian GNU/Linux 9
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2020 NVIDIA Corporation
Built on Thu_Jun_11_22:26:38_PDT_2020
Cuda compilation tools, release 11.0, V11.0.194