Memory bandwidth on Orin

shar24 · January 2, 2024, 7:36am

Hi.
The advertised memory bandwidth on Orin is 204.8GB/s per my understanding of Orin’s documentation.

When i measure it using nvidia memory bandwidth-test sample code I see huge difference between host<->device memory throughput to device<->device memory throughput.

For host-to-device and the other way around the throughput is ~35GB/s. Far from advertised throughput.
For device-to-device, the measured throughput is ~170GB/s. which comes along with the advertised number.

what can explain this huge difference given both CPU and GPU use same memory ?
i saw this question asked several times in the forum so far, but didn’t see any clear answer/explanation so far.

Thank you.

DaneLLL · January 2, 2024, 8:25am

Hi,
Do you mean the throughput of cudaMemcpy() does not meet target performance? Would like to confirm what issues you are facing.

And latest Jetpack release is 5.1.2. It would be great if you can use latest version.

shar24 · January 2, 2024, 1:56pm

Hi.
I’m referring to the “bandwidthTest” in cuda samples. Currently using Jetpack 5.1.1.

As i understand, this test checks (async) copy-throughput : from host to device, from device to host and from device to device.

My question is why do I see such x5 throughput difference when comparing host<->device transfers vs. device<->device transfers?

Moreover, when looking only on the host<->device memory copies, the throughput seems at the magnitude of PCIe (Gen4) transfer. Far from throughput I tend to expect from on-SoM memory. why is that?

Thank you.

kayccc · February 29, 2024, 1:54am

Is this still an issue to support? Any result can be shared?

shar24 · February 29, 2024, 4:46am

Hi. It is not a support issue but rather request for info/explanation. I still can not explain the findings above. I hoped Nvidia or other forum member could.
Thanks.

AastaLLL · February 29, 2024, 7:05am

Hi,

Please find the below document for some explanation.

First, since Jetson is a shared memory system, pinned memory bandwidth is much better than pageable memory.
This can be tested via the below command:

 $ ./bandwidthTest -memory pinned
 $ ./bandwidthTest -memory pageable

However, the buffer is accessible by the CPU so it won’t be as fast as the GPU buffer that all owned by the GPU itself.

Thanks.

shar24 · February 29, 2024, 7:32am

Thank you for the detailed answer. is it possible to elaborate on below part please ?
“However, the buffer is accessible by the CPU so it won’t be as fast as the GPU buffer that all owned by the GPU itself.”

AastaLLL · March 1, 2024, 1:26am

Hi,

D2H is different from the D2D task.
D2D is a pure GPU task so it can be done fast if GPU resources are available.

Unfortunately, we are not able to disclose further implementation details here.

Thanks.

shar24 · March 1, 2024, 7:00am

Thank you for the explanation.

system · March 15, 2024, 7:35am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Confused about memory bandwidth Jetson Orin NX cuda , kernel	5	2410	May 3, 2023
GPU Compute and memory benchmarks for Jetson AGX Orin Jetson AGX Orin performance	7	144	December 12, 2024
Confused about the Orin GPU info Jetson Orin NX gpu	4	390	March 18, 2024
Jetson AGX Orin Memory Bus Width Jetson AGX Orin hw	11	1133	May 17, 2023
Orin Shared Memory size documentation Jetson AGX Orin cuda	3	3110	June 14, 2022
Regarding the issue of the GPU compute power test results being significantly lower than expected Jetson Orin Nano nvbugs , gpu-computing	1	40	August 23, 2024
Why does my Jetson Orin Nano Only Has About 6.5GB Memory? Jetson Orin Nano	4	304	April 24, 2024
Integrated GPU cache coherence on Orin Jetson Orin Nano cuda	4	968	August 23, 2023
Orin low performance on mobilnetv1 ssd Jetson AGX Orin jetson-inference	7	1152	June 1, 2022
Memory slow speed Jetson AGX Orin performance	2	319	February 1, 2024

Memory bandwidth on Orin

Related topics