To test the performance of my Orin Nano Dev Kit with the JetPack 6.1.1 update, I tried running the benchmarks listed here: Benchmarks - NVIDIA Jetson AI Lab
But my benchmark results were close to the Jetson Orin Nano (original) results rather than the Super, although I have L4T 36.4.2 and Jetpack 6.1.1, and I am running with the MAXN power mode.
My benchmark results were:
┌───────────────────────────────────────────────────┬──────────────┬───────────────┬──────────────────────┬────────────────────┬────────────────────┬────────────────────┬───────────────┐
│ model │ input_tokens │ output_tokens │ prefill_time │ prefill_rate │ decode_time │ decode_rate │ memory │
├───────────────────────────────────────────────────┼──────────────┼───────────────┼──────────────────────┼────────────────────┼────────────────────┼────────────────────┼───────────────┤
│ HF://dusty-nv/Qwen2.5-7B-Instruct-q4f16_ft-MLC │ 19 │ 128 │ 0.09362168966666667 │ 203.19562173464928 │ 8.312818675569554 │ 15.415974259472037 │ 1137.40625 │
│ HF://mlc-ai/gemma-2-2b-it-q4f16_1-MLC │ 13 │ 107 │ 0.10108033366666669 │ 110.55510910207386 │ 4.2831975709152434 │ 25.143537873069416 │ 1350.90234375 │
│ HF://mlc-ai/gemma-2-9b-it-q4f16_1-MLC │ 19 │ 112 │ 0.3674690300000001 │ 52.607414458243184 │ 11.361786206959943 │ 9.90668822595116 │ 1765.7421875 │
│ HF://dusty-nv/Phi-3.5-mini-instruct-q4f16_ft-MLC │ 17 │ 128 │ 0.06081533433333333 │ 280.88185846653334 │ 5.146252640755906 │ 24.872552661269584 │ 995.42578125 │
│ HF://dusty-nv/SmolLM2-135M-Instruct-q4f16_ft-MLC │ 7 │ 128 │ 0.014524838333333335 │ 396.9837310105139 │ 0.920403588031496 │ 139.0801310427276 │ 1107.5625 │
│ HF://dusty-nv/SmolLM2-360M-Instruct-q4f16_ft-MLC │ 8 │ 108 │ 0.012918514333333334 │ 477.5950234813998 │ 0.9676859710808632 │ 111.86486025638118 │ 1141.50390625 │
│ HF://dusty-nv/SmolLM2-1.7B-Instruct-q4f16_ft-MLC │ 14 │ 128 │ 0.030061954000000002 │ 439.2040223981106 │ 2.9376747241154857 │ 43.57207748997136 │ 1043.3671875 │
│ HF://dusty-nv/Llama-3.1-8B-Instruct-q4f16_ft-MLC │ 18 │ 128 │ 0.09009451900000001 │ 203.6137725906588 │ 7.849457652913387 │ 16.41234380741825 │ 1295.6171875 │
│ HF://dusty-nv/Llama-3.2-3B-Instruct-q4f16_ft-MLC │ 18 │ 128 │ 0.054136466666666674 │ 338.6110429267449 │ 4.672917605123359 │ 27.394388706154878 │ 1196.9140625 │
└───────────────────────────────────────────────────┴──────────────┴───────────────┴──────────────────────┴────────────────────┴────────────────────┴────────────────────┴───────────────┘
The decode rate values are mostly lower than those reported on the benchmark page for the Super version.
I’ll list below my system details:
-
docker image used by the benchmark:
dustynv/mlc:0.1.4-r36.4.2
-
jtop
info:
System specs:
Platform Serial Number: [s|XX CLICK TO READ XXX]
Machine: aarch64 Hardware
System: Linux Model: NVIDIA Jetson Orin Nano Developer Kit
Distribution: Ubuntu 22.04 Jammy Jellyfish 699-level Part Number: 699-13767-0005-300 R.1
Release: 5.15.148-tegra P-Number: p3767-0005
Python: 3.10.12 Module: NVIDIA Jetson Orin Nano (Developer kit)
SoC: tegra234
Libraries CUDA Arch BIN: 8.7
CUDA: 12.6.68 L4T: 36.4.2
cuDNN: 9.3.0.75 Jetpack: 6.1 (rev1)
TensorRT: 10.3.0.30
VPI: 3.2.4 Hostname: ubuntu
Vulkan: 1.3.204 Interfaces
OpenCV: 4.5.4 with CUDA: NO wlP1p1s0: 10.20.1.46
docker0: 172.17.0.1
br-2067a88f117f: 172.18.0.1
nvpmodel -q
output:
NV Power Mode: MAXN
2