Mem usage not match for device&process after calling cudaMallocManaged+cudaMemPrefetch

artar · January 19, 2021, 3:45am

I found that after prefetching a block of memory allocated by cudaMallocManaged with cudaMemPrefetch, memory usage of device will increase but memory usage of process just keep the same. And I can reproduce it with the test code below, Is it designed to be like this?

test code
test.cu (5.6 KB)

system info:
Debian GNU/Linux 9
driver: 450.80.02
cuda: 11.0
nvcc:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Thu_Jun_11_22:26:38_PDT_2020
Cuda compilation tools, release 11.0, V11.0.194
Build cuda_11.0_bu.TC445_37.28540450_0
gcc: 8.3.0

Robert_Crovella · January 20, 2021, 12:14am

I’m not sure why there would be any expectation that process memory usage would change based on a call to cudaMemPrefetchAsync().

artar · January 23, 2021, 1:38am

Cuz the memory prefetched was allocated by that process?

It confused me as cudaMallocManagedis called in a third party library, and I can’t tell whether anything goes wrong in my program.

Anyway, thank you for your explanation! Maybe I just need a deeper understanding of Unified-Memory !

Robert_Crovella · January 24, 2021, 1:30am

My expectation is that a process reservation of the (host) memory will take place at the point of cudaMallocManaged call. I wouldn’t expect migrating that memory from one place to another would change the process reservation, but I haven’t studied it closely.

Topic		Replies	Views
cudaMallocManaged and CUDA 8.0 CUDA Programming and Performance	5	2526	June 21, 2018
Unexpected managed (unified) memory behaviour CUDA Programming and Performance	0	552	May 29, 2019
cudaMallocManaged() clarification needed CUDA Programming and Performance	5	10989	November 20, 2018
Does cudaMalloc increases the private bytes used on host? CUDA Programming and Performance	9	1455	July 24, 2023
cudaMallocManaged allocating more memory than requested CUDA Programming and Performance	7	3129	July 13, 2018
Sharing GPU global memory with multiple CPU threads CUDA Programming and Performance	5	2603	February 26, 2019
Why is MallocManagedMemory slower CUDA Programming and Performance	4	1176	July 22, 2018
Doubt on Unified Memory Data Transfer CUDA NVCC Compiler	2	369	November 22, 2023
Clarification on cudaMemAdviseSetReadMostly? CUDA Programming and Performance	3	1129	January 19, 2018
Access Unified Memory location from two different application CUDA Programming and Performance	4	488	January 2, 2023

Mem usage not match for device&process after calling cudaMallocManaged+cudaMemPrefetch

Related topics