Difference between cudamallocmanaged and malloc/new

mgawan · February 20, 2025, 6:09pm

memory allocated using cudamallocmanaged or malloc/new when accessed from GPUs works, cudamallocmanaged gives way better performance. Memory management behind cudamallocmanaged is well documented. My question is, what is going on when GPU’s access memory allocated via malloc/new, i.e. is the data transferred as bytes (as needed) or also as pages? Any references to where its documented would be great!

Robert_Crovella · February 20, 2025, 7:36pm

(host code) malloc/new for device code access only works on devices/systems where that feature is available. Currently, this would be x86 systems with the HMM, or Grace with ATS feature. Here are a couple recent threads:

1 2 3

which also link to more info.

rnaza005 · February 20, 2025, 9:57pm

Hi,

My understanding about Grace with the ATS feature is that as GPU threads keep accessing cache lines, these accesses are served from CPU main memory (LPDDR). Below you can see how these accesses are made:

As GPU thread keeps accessing cache lines up to some accesses, these are served through ATS from the host memory. But later, data migration happens and the GPU thread accesses the remaining data from its DRAM (HBM). The exact mechanism,i.e. how many accesses are needed for data migration and how much data (pages) are migrated, is unclear.

CudaMallocManaged allocates memory on the host side and brings memory to the GPU memory when a page fault occurs. The reason CudaMallocManaged gives better performance is that it involves data migration + access to device memory while accessing malloc/new-allocated memory (host-side) involves access to host memory (>4x higher latency) + data migration + access to device memory.

I hope this answers your question.
Best.

pmpakos · May 3, 2026, 1:47pm

Hello. Can you please share the experiment that you ran in order to get this plot? Thank you!

Topic		Replies	Views
Difference between cudaMallocManaged and cudaMallocHost CUDA Programming and Performance cuda	3	16481	March 30, 2022
a question about cudaMallocManaged（） CUDA Programming and Performance	4	643	November 17, 2018
uncached memory created by cudaHostAlloc and cudaMemcpyAsync issues on TX1 Jetson TX1	3	1833	July 15, 2016
First impressions of CUDA 6 managed memory CUDA Programming and Performance	1	2558	February 25, 2014
Is cudaMallocmanaged with cudaMemAttachHost flag is faster than malloc? Jetson TK1	1	1463	February 15, 2016
Memcopy between two cuda managed addresses much faster than between cuda managed and not managed CUDA Programming and Performance	3	117	March 22, 2025
Managed memory vs cudaHostAlloc - TK1 CUDA Programming and Performance	10	6296	February 22, 2016
Does cudaMallocManaged put each allocation into separate memory pages? CUDA Programming and Performance	3	679	September 24, 2021
Using ATS on GH200 CUDA Programming and Performance llama	5	1378	February 7, 2025
Managed memory vs cudaHostAlloc - TK1 Jetson TK1	6	2137	February 15, 2016

Difference between cudamallocmanaged and malloc/new

Related topics