Optimising GPU and CPU memory transfer time (CUDA/Hardware)?

cbuchner1 · December 23, 2021, 10:07am

How do you manage your memory buffers? Do you use the traditional approach to allocate host memory with malloc()/new and device memory with cudaMalloc(), using cudaMemcpy() to transfer between both?

For the Jetson series, it might be useful to look into zero copy memory via cudaHostAlloc() or alternatively Unified memory via cudaMallocManaged() (assuming the latter is supported on your Jetson Nano platform). This should eliminate any memory copy overhead on your platform.

Here’s a related thread that I found. It has some links to useful resources.
https://forums.developer.nvidia.com/t/jetson-nano-device-local-memory-specifications/73524/6

Topic		Replies	Views
Best hardware options to reduce GPU and CPU memory transfer time? Jetson Nano	6	1082	January 19, 2022
Performance issues after refactoring CUDA code to avoid managed memory CUDA Programming and Performance jetson	5	86	November 19, 2024
CPU operation is very slow on memory allocated by cudaMallocHost Jetson TX2	13	1747	October 18, 2021
Asynchronous memory transfer on Jetson TX1 Jetson TX1	10	1625	October 18, 2021
RE: Performance issues after refactoring CUDA code to avoid managed memory Jetson AGX Xavier cuda	4	58	November 25, 2024
Questions about efficient memory management for TensorRT on TX2 CUDA Programming and Performance	8	2024	October 12, 2021
Maximizing Unified Memory Performance in CUDA Technical Blog	18	1282	May 14, 2019
How to Overlap Data Transfers in CUDA C/C++ Technical Blog	23	2268	January 18, 2023
A little help with Multi-GPU example please :) How do I pass data to each GPU? CUDA Programming and Performance	8	28022	March 4, 2012
cudaMallocManaged on jetson devices CUDA Programming and Performance cuda , jetson	3	1870	March 6, 2023

Optimising GPU and CPU memory transfer time (CUDA/Hardware)?

Related topics