Unified Memory has poor performance on Jetson AGX Xavier

978882506 · January 12, 2022, 1:29pm

I tried run the cuda sample “UnifiedMemoryPerf” which in the folder “/usr/local/cuda/samples/1_Utilities/” to test the unified memory on Jetson AGX Xavier 32GB within mode 0 (MAXN), here is the result:

As you see, the CpPglAs (use host pagelocked and device memory async) is the fastest and the UMeasy (unified memory with no hints) is the slowest. I thought perhaps it due to the test buffers are too small, so I tried another benchmark cuda-benchmarks. Following picture shows the result:

where “simpleMemcpy” use ordinary “cudaMemcpy”, “simpleDMA” use pinned memory (AKA zero memory) , “simpleManaged” use unified memory and “400000000” means 400 MB buffer. It shows that the unified memory perform well in computing but cost much more time in accessing all arrays than “simpleMemcpy” and “simpleDMA”. But according to the nvidia offical mannual https://docs.nvidia.com/cuda/cuda-for-tegra-appnote/index.html#pinned-memory and some answers in this forum, people always recommend user to use unified memory. Anyone can explain these results? Thanks

978882506 · January 12, 2022, 1:30pm

Here is another test, I ran the same sample on a computer with i5-8500 3.00 GHz*6, 16 GB, GTX 1660, Ubuntu 18.04 :

in this condition, zero-copy is the slowest, unified memory without hints performance is not bad. So I am very confused that consider CPU and GPU shared the same physical memory on Jetson, unified memory should perform better than common CPU-GPU-seperated computer, right?

AastaLLL · January 13, 2022, 3:58am

Hi,

Have you maximized the device performance first?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

In previous Jetson, the pinned memory is not ideal since no cache support.
From Xavier, there is an I/O coherency feature that can improve the performance.
So you might find out that the pinned memory is competitive compared to the unified memory.

But please note the I/O coherency is a one-way feature, which indicates GPU can read CPU caches but CPU cannot.
In some cases, for example, GPU uses the buffer as output (write data).
Unified memory will be a better choice although it does take some overhead in the initial.

Thanks.

978882506 · January 14, 2022, 1:50am

Yes, I am sure the jetson is running under mode 0 and I turn on the fan.

Just in case, I ran these commands as you mentioned and tested the benchmark again, but the output was the same as before.

Thanks for your explaining! I want to know why the Unified memory doesn’t work well on my Jetson although it has these advantages as you mentioned

978882506 · January 16, 2022, 3:37pm

Is somebody else has answer for my questions?

AastaLLL · January 18, 2022, 6:16am

Hi,

Please note that pinned memory is zero-copy memory.
That’s why you can get much better performance for zero-copy memory.

Thanks.

system · February 9, 2022, 3:08am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why does it take longer for a program to use Unified Memory than not to use Uuified Memoery? Jetson AGX Xavier cuda	2	281	October 18, 2021
AGX Xavier -> Unified Memory questions Jetson AGX Xavier cuda	2	1021	June 25, 2021
Why does it take longer for a program to use Unified Memory than not to use Uuified Memoery? Jetson AGX Xavier cuda	3	382	October 18, 2021
Performance of Unified Memory in CUDA 11.4 v CUDA 10.2 Jetson AGX Xavier cuda , nvbugs	1	526	June 1, 2023
Zero-Copy and Managed memory on Jetson Jetson TX1	9	11462	August 20, 2018
Cuda memory copy throughput in jetson device Jetson AGX Xavier cuda	2	378	June 15, 2022
Zero-copy still copy data? Jetson AGX Xavier	7	3630	October 18, 2021
The memory sharing between cpu and gpu in Jetson TX2 Jetson TX2	6	7005	October 18, 2021
Does unified memory and zero copy always better than cudaMemcpy? CUDA Programming and Performance	4	1478	February 10, 2018
Unified Memory on Jetson Platforms Jetson Xavier NX cuda	4	4303	October 18, 2021

Unified Memory has poor performance on Jetson AGX Xavier

Related topics