CUDACasts Episode 18: CUDA 6.0 Unified Memory

Originally published at:

CUDA 6 introduces Unified Memory, which dramatically simplifies memory management for GPU computing. Now you can focus on writing parallel kernels when porting code to the GPU, and memory management becomes an optimization. The CUDA 6 Release Candidate is now publicly available. In today’s CUDACast, I will show you some simple examples showing how easy…


Hi Mark, I read somewhere that Maxwell GPUs can directly access system main memory. But I couldn't find how this access is performed or any benchmarking about it. Do you know any document about it?

Fermi, Kepler, and Maxwell GPUs can all access host memory directly via what is known as "Zero Copy". Zero copy basically maps a host pointer into the device address space and then the device accesses the memory over PCI-e. This is different from Unified Memory, which is available on Kepler and later GPUs. Zero copy performance is always limited to PCI-e throughput speeds. There is a bit of discussion in my post on Unified Memory. You may also want to look at the "Simple Zero-Copy" sample included with the CUDA Toolkit package, and the documentation of page-locked host memory and mapped memory here: