Hi ,
I am currently testing the performance of the Jetson Orin Nano Super 8GB, and I encountered some discrepancies:
According to your benchmark page (Benchmarks - NVIDIA Jetson AI Lab ), the performance of Qwen2.5 7B is listed as 21.75 tokens/second.
However, when I tested using the dusty-nv/Qwen2.5-7B-Instruct-q4f16_ft-MLC model with the following command:
python3 benchmark.py --model /root/.cache/mlc_llm/dusty-nv/Qwen2.5-7B-Instruct-q4f16_ft-MLC --max-num-prompts 4 --prompt ~/.cache/mlc_llm/jetson-containers-master/data/prompts/completion_1024.json --prefill-chunk-size 1024 --save Qwen2.5-7B-Instruct-q4f16_ft-MLC.csv
I obtained a result of 19.23 tokens/second .
I have attached the image file
with the detailed results.
Could you please advise how I might be able to achieve the same performance as shown in your benchmark?
Thank you very much for your assistance!
Best regards,
richer.chan
Hi,
The benchmark is tested under the super mode.
You can set up the Orin Nano with below commands to enable super mode:
$ sudo nvpmodel -m 2
$ sudo jetson_clocks
Thanks.
Hi AastaLLL,
Thanks for your reply.
I have tried to enable the jetson_clocks and my board have been always set the nvpmodel to 2.
Then I test again but the result is nearly same as before I obtained.
Attached the log:
bench_log.txt (47.9 KB)
Please let me see if anything you might need from my side.
Thanks.
Hi,
Sorry for the late update.
Your hardware setting looks correct to me (Orin Nano super mode).
To minimize the difference, could you try the below script instead?
#!/usr/bin/env bash
#
# Llama benchmark with MLC. This script should be invoked from the host and will run
# the MLC container with the commands to download, quantize, and benchmark the models.
# It will add its collected performance data to jetson-containers/data/benchmarks/mlc.csv
#
# Set the HUGGINGFACE_TOKEN environment variable to your HuggingFace account token
# that has been granted access to the Meta-Llama models. You can run it like this:
#
# HUGGINGFACE_TOKEN=hf_abc123 ./benchmark.sh meta-llama/Llama-2-7b-hf
#
# If a model is not specified, then the default set of models will be benchmarked.
# See the environment variables below and their defaults for model settings to change.
#
# These are the possible quantization methods that can be set like QUANTIZATION=q4f16_ft
#
# (MLC 0.1.0) q4f16_0,q4f16_1,q4f16_2,q4f16_ft,q4f16_ft_group,q4f32_0,q4f32_1,q8f16_ft,q8f16_ft_group,q8f16_1
# (MLC 0.1.1) q4f16_0,q4f16_1,q4f32_1,q4f16_2,q4f16_autoawq,q4f16_ft,e5m2_e5m2_f16
#
set -ex
This file has been truncated. show original
Thanks.
system
Closed
June 4, 2025, 1:23am
7
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.