GPU Compute and memory benchmarks for Jetson AGX Orin

We are looking for benchmarks that can give the peak FLOP/s and memory bandwidth on the Jetson AGX Orin.

https://github.com/NVIDIA-AI-IOT/jetson_benchmarks: We looked at this, and it seems to focus on deep learning workloads. We are interested in measuring the peak performance/bandwidth. Please recommend any standard benchmarks.

Hi,
Here are some suggestions for the common issues:

1. Performance

Please run the below command before benchmarking deep learning use case:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. Installation

Installation guide of deep learning frameworks on Jetson:

3. Tutorial

Startup deep learning tutorial:

4. Report issue

If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.

Thanks!

Hi,
You may try this stress test:

Jetson/L4T/TRT Customized Example - eLinux.org

Is there a similar recommended benchmark to measure memory bandwidth that the GPU is able to effectively get?

Hi,

Please find our CUDA bandwidthTest sample below:

Thanks.

1 Like

I was able to run this and got the following result:

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 36.6

This is the bandwidth obtained in coping data from the CPU to the GPU, even though in this case, both use the same physical DRAM. Is my understanding of this correct?

Also, theoretical DRAM bandwidth is around 204 GB/s, while this shows 36.6. Is this expected? What kind of overheads are involved here?

Hi,

Yes, on Jetson both CPU and GPU use the same physical memory.
To understand more about Jetson’s memory, please find the document below:

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.