Jetpack6.2+TensorRT OOM issue

VincentWang · February 5, 2025, 7:36am

Recently, I saw in Nvidia’s press release that Jetpack 6.2 can enhance the performance of the NX and Nano.
NVIDIA JetPack 6.2 Brings Super Mode to NVIDIA Jetson Orin Nano and Jetson Orin NX Modules | NVIDIA Technical Blog, and it metion lots of LLM model can be run on Nano (see Table 4. Benchmark performance in tokens/sec for popular LLMs on Jetson Orin Nano 8GB in this topic)

However, when I tried to use the official TensorRT package for inference on Llama-2-7b, I encountered OOM issue. Here is reproduce step:

Install Jetpack6.2 and let Power mode to MAX, using jtop to check

2025-02-05_15-081288×421 19.5 KB
Use tensorrt_llm wheel from Jetson AI Lab TensorRT-LLM - NVIDIA Jetson AI Lab
I extract example shell script from tensorrt_llm containter(dustynv/tensorrt_llm:0.12-r36.4.0):

#!/usr/bin/env bash
set -ex

MODEL="/mnt/Llama-2-7b-chat-hf"
QUANT="/mnt/Llama-2-7B-Chat-GPTQ/model.safetensors"

LLAMA_EXAMPLES="/opt/TensorRT-LLM/examples/llama"
TRT_LLM_MODELS="/mnt/models/tensorrt_llm"

: "${FORCE_BUILD:=off}"


llama_fp16() 
{
	output_dir="$TRT_LLM_MODELS/$(basename $MODEL)-fp16"
	
	if [ ! -f $output_dir/*.safetensors ]; then
		python3 $LLAMA_EXAMPLES/convert_checkpoint.py \
			--model_dir $(huggingface-downloader $MODEL) \
			--output_dir $output_dir \
			--dtype float16
	fi

	trtllm-build \
		--checkpoint_dir $output_dir \
		--output_dir $output_dir/engines \
		--gemm_plugin float16
}

llama_gptq() 
{
	output_dir="$TRT_LLM_MODELS/Llama-2-7b-chat-hf-gptq"
	engine_dir="$output_dir/engines"
	
	# if [ ! -f $output_dir/*.safetensors ] || [ $FORCE_BUILD = "on" ]; then
	# 	python3 $LLAMA_EXAMPLES/convert_checkpoint.py \
	# 		--model_dir $(huggingface-downloader $MODEL) \
	# 		--output_dir $output_dir \
	# 		--dtype float16 \
	# 		--quant_ckpt_path $(huggingface-downloader $QUANT) \
	# 		--use_weight_only \
	# 		--weight_only_precision int4_gptq \
	# 		--group_size 128 \
	# 		--per_group
	# fi
	
	if [ ! -f $engine_dir/*.engine ] || [ $FORCE_BUILD = "on" ]; then
	    trtllm-build \
		    --checkpoint_dir $output_dir \
		    --output_dir $engine_dir \
		    --gemm_plugin auto \
		    --log_level verbose \
		    --max_batch_size 1 \
		    --max_num_tokens 512 \
		    --max_seq_len 512 \
		    --max_input_len 128	    
    fi

    python3 $LLAMA_EXAMPLES/../run.py \
        --max_input_len=128 \
        --max_output_len=128 \
        --max_attention_window_size 256 \
        --max_tokens_in_paged_kv_cache=256 \
        --tokenizer_dir $MODEL \
        --engine_dir $engine_dir

    python3 /opt/TensorRT-LLM/benchmarks/python/benchmark.py \
        -m dec \
        --engine_dir $engine_dir \
        --quantization int4_weight_only_gptq \
        --batch_size 1 \
        --input_output_len "16,128;32,128;64,128;128,128" \
        --log_level verbose \
        --enable_cuda_graph \
        --warm_up 2 \
        --num_runs 3 \
        --duration 10  
}

#llama_fp16
llama_gptq

I ran the llama_gptq() by calling FORCE_BUILD=on bash llama.sh, but did not run convert_checkpoint.py part (commented in shell script) since I already converted from 6000ADA server.
4. After I ran this script, Nano OOM and restart by itself, here is jtop capture when Nano OOM.

And here is my question:

Were the official data evaluated using the TensorRT?
Is there any difference between my environment and the package compared to the official one?

Thanks.

Topic		Replies	Views
Nvidia Jetson Orin Nano tensorrt llm Jetson Orin Nano tensorrt , generative_ai	6	490	August 5, 2024
Nvidia jetson orin nano has tensorrt support? Jetson Orin Nano tensorrt	2	189	April 7, 2025
Orin Nano - Building TensorRT-LLM from source Jetson Orin Nano tensorrt , cuda , llama	9	569	November 17, 2025
Jeston Nano 2GB Out of Memory With ONNX->TensorRT Conversion Jetson Nano tensorrt , nano2gb	2	1044	October 15, 2021
TensorRT process killed with Orin Nano Jetson Orin Nano tensorrt	9	1502	July 27, 2023
Tensort-RT LLM Support for Jetson Jetson Orin Nano generative_ai	2	86	January 22, 2026
TensorRT-LLM on Jetson Orin NX(16GB) Jetson Orin NX tensorrt , jetson-inference , generative_ai	9	1301	February 12, 2025
Can't start NanoVLM on Orin Nano 8GB Jetson Orin Nano jetson-inference , generative_ai	2	239	January 13, 2025
Jetson Orin Nano - Unknown embedded device detected Jetson Orin Nano tensorrt	9	3257	December 7, 2022
Jetson orin nano deploying yolo with llm Jetson Orin Nano llm	4	89	February 9, 2026

Jetpack6.2+TensorRT OOM issue

Related topics