Maximizing Unified Memory Performance in CUDA

jwitsoe · August 21, 2022, 11:44pm

Originally published at: Maximizing Unified Memory Performance in CUDA | NVIDIA Technical Blog

Many of today’s applications process large volumes of data. While GPU architectures have very fast HBM or GDDR memory, they have limited capacity. Making the most of GPU performance requires the data to be as close to the GPU as possible. This is especially important for applications that iterate over the same data multiple times…

Topic		Replies	Views
Maximizing Unified Memory Performance in CUDA Technical Blog	18	1344	May 14, 2019
CUDACasts Episode 18: CUDA 6.0 Unified Memory Technical Blog	3	343	October 20, 2015
Page streaming with UVM system CUDA Programming and Performance	5	389	July 4, 2024
Beyond GPU Memory Limits with Unified Memory on Pascal Technical Blog	15	947	March 11, 2022
Explicit page migration without data copy for Unified Memory CUDA Programming and Performance	0	481	April 2, 2018
Improving GPU Memory Oversubscription Performance Technical Blog	5	892	July 16, 2025
Does unified memory and zero copy always better than cudaMemcpy? CUDA Programming and Performance	4	1537	February 10, 2018
CUDA 6.5 Unified Memory (cudamallocmanaged) CUDA Programming and Performance	1	2183	February 18, 2015
cuda unified memory: memory transfer behaviour CUDA Programming and Performance	1	597	August 12, 2016
Performance decrease on Unified GracehopperC CUDA Programming and Performance	10	236	July 27, 2025

Maximizing Unified Memory Performance in CUDA

Related topics