Improving GPU Memory Oversubscription Performance

Originally published at: https://developer.nvidia.com/blog/improving-gpu-memory-oversubscription-performance/

Since its introduction more than 7 years ago, the CUDA Unified Memory programming model has kept gaining popularity among developers. Unified Memory provides a simple interface for prototyping GPU applications without manually migrating memory between host and device. Starting from the NVIDIA Pascal GPU architecture, Unified Memory enabled applications to use all available CPU and…

1 Like

where is the code example?

You can find the code example here . Thank you for your interest.

1 Like

yw, I downloaded and updated to the latst Commit: 0754981b37b343474c45222ea487c9667551e854 [0754981] of master branch.
But could not find any project files, how to debug .cu file using window 10& VS2019?
Commit: 0754981b37b343474c45222ea487c9667551e854 [0754981]
Parents: 9ad4c010fd, 5a003551a1
Author: Mark Harris mharris@nvidia.com
Date: Tuesday, August 3, 2021 2:44:36 PM
Committer: GitHub
Merge pull request #36 from chirayuG-nvidia/unified_memory

Add Unified Memory oversubscription benchmark

Apologies for the delay in the response.
We don’t have visual studio project files for this sample, I think it should be easy to convert to a VS project from the provided Makefile. Worth mentioning that many of the Unified Memory features such as on-demand paging, oversubscription discussed here are not available on Windows, these limitation are documented here.

The zero-copy performance of this microbench on the grace hopper is restricted to about 50GB/s which is 1/10th of the available LPDDR5 bandwidth. What could be the reason for this?

./uvm_oversubs -p 2 -a streaming -m zero_copy

Read,Zero_copy,streaming,2.000000,2MB,blocksize=128,loop_count=3, 3698.625732 ms, 51.100498 GB/s