Jetpack6.2+TensorRT OOM issue

VincentWang · February 5, 2025, 7:36am

Recently, I saw in Nvidia’s press release that Jetpack 6.2 can enhance the performance of the NX and Nano.
NVIDIA JetPack 6.2 Brings Super Mode to NVIDIA Jetson Orin Nano and Jetson Orin NX Modules | NVIDIA Technical Blog, and it metion lots of LLM model can be run on Nano (see Table 4. Benchmark performance in tokens/sec for popular LLMs on Jetson Orin Nano 8GB in this topic)

However, when I tried to use the official TensorRT package for inference on Llama-2-7b, I encountered OOM issue. Here is reproduce step:

Install Jetpack6.2 and let Power mode to MAX, using jtop to check

2025-02-05_15-081288×421 19.5 KB
Use tensorrt_llm wheel from Jetson AI Lab TensorRT-LLM - NVIDIA Jetson AI Lab
I extract example shell script from tensorrt_llm containter(dustynv/tensorrt_llm:0.12-r36.4.0):

#!/usr/bin/env bash
set -ex

MODEL="/mnt/Llama-2-7b-chat-hf"
QUANT="/mnt/Llama-2-7B-Chat-GPTQ/model.safetensors"

LLAMA_EXAMPLES="/opt/TensorRT-LLM/examples/llama"
TRT_LLM_MODELS="/mnt/models/tensorrt_llm"

: "${FORCE_BUILD:=off}"


llama_fp16() 
{
	output_dir="$TRT_LLM_MODELS/$(basename $MODEL)-fp16"
	
	if [ ! -f $output_dir/*.safetensors ]; then
		python3 $LLAMA_EXAMPLES/convert_checkpoint.py \
			--model_dir $(huggingface-downloader $MODEL) \
			--output_dir $output_dir \
			--dtype float16
	fi

	trtllm-build \
		--checkpoint_dir $output_dir \
		--output_dir $output_dir/engines \
		--gemm_plugin float16
}

llama_gptq() 
{
	output_dir="$TRT_LLM_MODELS/Llama-2-7b-chat-hf-gptq"
	engine_dir="$output_dir/engines"
	
	# if [ ! -f $output_dir/*.safetensors ] || [ $FORCE_BUILD = "on" ]; then
	# 	python3 $LLAMA_EXAMPLES/convert_checkpoint.py \
	# 		--model_dir $(huggingface-downloader $MODEL) \
	# 		--output_dir $output_dir \
	# 		--dtype float16 \
	# 		--quant_ckpt_path $(huggingface-downloader $QUANT) \
	# 		--use_weight_only \
	# 		--weight_only_precision int4_gptq \
	# 		--group_size 128 \
	# 		--per_group
	# fi
	
	if [ ! -f $engine_dir/*.engine ] || [ $FORCE_BUILD = "on" ]; then
	    trtllm-build \
		    --checkpoint_dir $output_dir \
		    --output_dir $engine_dir \
		    --gemm_plugin auto \
		    --log_level verbose \
		    --max_batch_size 1 \
		    --max_num_tokens 512 \
		    --max_seq_len 512 \
		    --max_input_len 128	    
    fi

    python3 $LLAMA_EXAMPLES/../run.py \
        --max_input_len=128 \
        --max_output_len=128 \
        --max_attention_window_size 256 \
        --max_tokens_in_paged_kv_cache=256 \
        --tokenizer_dir $MODEL \
        --engine_dir $engine_dir

    python3 /opt/TensorRT-LLM/benchmarks/python/benchmark.py \
        -m dec \
        --engine_dir $engine_dir \
        --quantization int4_weight_only_gptq \
        --batch_size 1 \
        --input_output_len "16,128;32,128;64,128;128,128" \
        --log_level verbose \
        --enable_cuda_graph \
        --warm_up 2 \
        --num_runs 3 \
        --duration 10  
}

#llama_fp16
llama_gptq

I ran the llama_gptq() by calling FORCE_BUILD=on bash llama.sh, but did not run convert_checkpoint.py part (commented in shell script) since I already converted from 6000ADA server.
4. After I ran this script, Nano OOM and restart by itself, here is jtop capture when Nano OOM.

And here is my question:

Were the official data evaluated using the TensorRT?
Is there any difference between my environment and the package compared to the official one?

Thanks.

VincentWang · February 5, 2025, 7:55am

Here is TensorRT log before OOM
testlog.txt (1.2 MB)

DavidDDD · February 5, 2025, 7:58am

Hi,

Do you follow the doc to enlarge RAM for Orin Nano 8GB?

Thanks

VincentWang · February 5, 2025, 8:17am

Hi, I tried following commnad:

sudo init 3
sudo systemctl disable nvargus-daemon.service
sudo systemctl disable nvzramconfig
sudo fallocate -l 16G /ssd/16GB.swap
sudo mkswap /ssd/16GB.swap
sudo swapon /ssd/16GB.swap

And here is initial condition of Nano, and SWAP extented to 19.7G

After running script, Nano still got OOM and restart, here is jtop capture when OOM

SWAP only used 589MB, is SWAP will be used on GPU? Logs from TensorRT separate Mem usage into CPU and GPU, although Mem on Nano can be used on CPU and GPU.

[02/05/2025-15:46:45] [TRT] [V] [MemUsageChange] Subgraph compilation: CPU +0, GPU +2, now: CPU 1215, GPU 5538 (MiB)

DavidDDD · February 5, 2025, 8:43am

Hi,

No.

TensorRT-LLM is verified by AGX Orin

If you want to inference Llama-2-7-b, you could try other like

Thanks

VincentWang · February 5, 2025, 8:50am

Okay, so this table from NVIDIA JetPack 6.2 Brings Super Mode to NVIDIA Jetson Orin Nano and Jetson Orin NX Modules | NVIDIA Technical Blog is also run by Small LLM?

Because I want to run Llama3.1-8B, Llama2-7B is only for testing TensorRT framework.

DavidDDD · February 5, 2025, 8:58am

Hi,

The benchmark you could refer this doc steps

Thanks

system · February 21, 2025, 12:00am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nvidia Jetson Orin Nano tensorrt llm Jetson Orin Nano tensorrt , generative_ai	6	254	August 5, 2024
Nvidia jetson orin nano has tensorrt support? Jetson Orin Nano tensorrt	2	52	April 7, 2025
Jeston Nano 2GB Out of Memory With ONNX->TensorRT Conversion Jetson Nano tensorrt , nano2gb	2	990	October 15, 2021
TensorRT process killed with Orin Nano Jetson Orin Nano tensorrt	9	1377	July 27, 2023
TensorRT-LLM on Jetson Orin NX(16GB) Jetson Orin NX tensorrt , jetson-inference , generative_ai	9	538	February 12, 2025
Can't start NanoVLM on Orin Nano 8GB Jetson Orin Nano jetson-inference , generative_ai	2	91	January 13, 2025
Jetson Orin Nano - Unknown embedded device detected Jetson Orin Nano tensorrt	9	3036	December 7, 2022
Run inference with model exported by Jetson Orin Nano 8GB on Jetson Orin Nano 4GB Jetson Orin Nano tensorrt , yolo	5	960	August 3, 2023
Run nano_llm problem TensorRT jetson , llama3-8b-instruct , llama	0	29	January 1, 2025
Can TensorRT-LLM be used on Jetson Orin NX with JetPack 6.1? Jetson Orin NX tensorrt , generative_ai	6	247	December 17, 2024

Jetpack6.2+TensorRT OOM issue

Related topics