TensorRT-LLM on Jetson Orin NX(16GB)

yanghang162 · January 1, 2025, 5:06am

Has anyone tried using TensorRT-LLM on Jetson Orin NX (16GB)? I keep encountering the issue “core dump” when using trtllm-build, even with a small model (0.5B). The official tests were conducted on AGX Orin.

kayccc · January 1, 2025, 11:51pm

Hi,
Here are some suggestions for the common issues:

1. Performance

Please run the below command before benchmarking deep learning use case:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. Installation

Installation guide of deep learning frameworks on Jetson:

TensorFlow: Installing TensorFlow for Jetson Platform - NVIDIA Docs
PyTorch: Installing PyTorch for Jetson Platform - NVIDIA Docs
We also have containers that have frameworks preinstalled:
Data Science, Machine Learning, AI, HPC Containers | NVIDIA NGC

3. Tutorial

Startup deep learning tutorial:

Jetson-inference: Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson
TensorRT sample: Jetson/L4T/TRT Customized Example - eLinux.org

4. Report issue

If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.

Thanks!

AastaLLL · January 2, 2025, 2:07am

Hi,

Could you share how you setup the environment?
Are you following the instructions shared in the below topic:

Thanks.

yanghang162 · January 5, 2025, 7:09am

Thanks a lot for your reply. Previously, I set up the TensorRT-LLM environment based on the link TensorRT-LLM Deployment on Jetson Orin.

Now I am using the pre-configured container. And I have successfully conducted inference on a small-scale model(1.5b) using the Tensor-LLM framework.

However, I still encounter issues when attampting to convert a 7b model. The model file is downloaded from qwen2.5-7b. My conversion command is:

python3 convert_checkpoint.py --model_dir /home/models/qwen/qwen-2.5_7b --output_dir /home/models/engines/qwen2.5_7b_ckpt --dtype=float16 --use_weight_only --weight_only_precision int8

The error message received is as follows:

/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
[TensorRT-LLM] TensorRT-LLM version: 0.12.0
0.12.0
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████| 4/4 [00:11<00:00,  2.86s/it]
[01/05/2025-14:08:12] Some parameters are on the meta device because they were offloaded to the cpu.
Traceback (most recent call last):
  File "/home/learn/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 308, in <module>
    main()
  File "/home/learn/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 300, in main
    convert_and_save_hf(args)
  File "/home/learn/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 256, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/home/learn/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 263, in execute
    f(args, rank)
  File "/home/learn/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 246, in convert_and_save_rank
    qwen = QWenForCausalLM.from_hugging_face(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 315, in from_hugging_face
    weights = load_weights_from_hf_model(hf_model, config)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 1221, in load_weights_from_hf_model
    weights = convert_hf_qwen(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 823, in convert_hf_qwen
    get_tllm_linear_weight(qkv_w, tllm_prex + 'attention.qkv.',
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 502, in get_tllm_linear_weight
    v.cpu(), plugin_weight_only_quant_type)
NotImplementedError: Cannot copy out of meta tensor; no data!

AastaLLL · January 6, 2025, 4:39am

Hi,

The 7B model cannot work on a 16GB device.
Please try the model with weight <= 4B.

Thanks.

yanghang162 · January 7, 2025, 7:13am

Thanks a lot.

yanghang162 · January 15, 2025, 1:51pm

Does “weight <= 4B” refers to the numbers of parmeters in the model? The 1.5B model actually works on Jetson Orin NX(16GB). However, when I try a 3B model, the conversion
process still fails. The model was downloaded from 通义千问2.5-3B-Instruct · 模型库.

The conversion command is:

python3 convert_checkpoint.py --model_dir Qwen2.5-3B-Instruct --output_dir qwen2.5_3b_ckpt --dtype float16

The error message is:

/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
[TensorRT-LLM] TensorRT-LLM version: 0.12.0
0.12.0
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.86s/it]
Weights loaded. Total time: 00:00:05
[1]    1132242 killed     python3 convert_checkpoint.py --model_dir  --output_dir  --dtype float16

yanghang162 · January 16, 2025, 5:14am

When converting the model Llama-3.2-1B-Instruct · 模型库, I encounter the following error:

/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
[TensorRT-LLM] TensorRT-LLM version: 0.12.0
0.12.0
117it [00:00, 177.11it/s]
Traceback (most recent call last):
  File "/home/learn/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 487, in <module>
    main()
  File "/home/learn/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 479, in main
    convert_and_save_hf(args)
  File "/home/learn/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 421, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/home/learn/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 428, in execute
    f(args, rank)
  File "/home/learn/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 410, in convert_and_save_rank
    llama = LLaMAForCausalLM.from_hugging_face(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 363, in from_hugging_face
    loader.generate_tllm_weights(model)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 329, in generate_tllm_weights
    tllm_weights.update(self.load(tllm_key))
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 271, in load
    v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/linear.py", line 380, in postprocess
    weights = weights.to(str_dtype_to_torch(self.dtype))
AttributeError: 'NoneType' object has no attribute 'to'
Exception ignored in: <function PretrainedModel.__del__ at 0xfffef0f240d0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 449, in __del__
    self.release()
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 446, in release
    release_gc()
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_utils.py", line 469, in release_gc
    torch.cuda.ipc_collect()
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 966, in ipc_collect
    _lazy_init()
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 338, in _lazy_init
    raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable

CUDA call was originally invoked at:

  File "/home/learn/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 8, in <module>
    from transformers import AutoConfig
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/transformers/__init__.py", line 26, in <module>
    from . import dependency_versions_check
  File "<frozen importlib._bootstrap>", line 1078, in _handle_fromlist
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/transformers/dependency_versions_check.py", line 16, in <module>
    from .utils.versions import require_version, require_version_core
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/__init__.py", line 27, in <module>
    from .chat_template_utils import DocstringParsingException, TypeHintParsingException, get_json_schema
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/chat_template_utils.py", line 39, in <module>
    from torch import Tensor
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 1955, in <module>
    _C._initExtension(_manager_path())
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 1539, in <module>
    _lazy_call(_register_triton_kernels)
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 261, in _lazy_call
    _queued_calls.append((callable, traceback.format_stack()))

My runtime environment is a pre-configured container. And my conversion command is:

python3 convert_checkpoint.py --model_dir Llama-3.2-1B-Instruct --output_dir llama3.2_1b_ckpt --dtype float16

Topic		Replies	Views
TensorRT-LLM for jetson errors Jetson AGX Orin generative_ai , paligemma , kosmos-2 , llama	14	229	January 16, 2025
TensorRT-LLM for Jetson Jetson AGX Orin generative_ai	9	877	January 1, 2025
Process killed during tensorrt conversion on Jetson orin NX (8 GB) Jetson Orin NX tensorrt	15	685	April 30, 2024
Get error message as conver qwen to int-gptq in tensorrt-llm for agx orin DRIVE AGX Orin General driveworks-dnn-framework	4	57	December 10, 2024
Skipping tactic 0x0000000000000000 due to Myelin error: Platform (Cuda) error Jetson Orin NX tensorrt	25	2223	January 25, 2023
Keras->Onnx->TensorRT Jetson AGX Orin tensorrt	4	86	September 25, 2024
Jetson Orin NX 35.2.1 program on 3509 carrier Jetson Orin NX reflash , kernel	22	1699	April 12, 2023
Trtexec model conversion crashed at insufficient gpu memory Jetson Orin NX jetson-inference	27	4810	January 11, 2023
TensorRT small model high RAM consumption during inference problem Jetson Orin Nano tensorrt , cuda , cudnn , yocto , jetson	10	86	November 7, 2024
Erorr with onnx to trt Jetson Xavier NX tensorrt	8	1233	March 30, 2022

TensorRT-LLM on Jetson Orin NX(16GB)

1. Performance

2. Installation

3. Tutorial

4. Report issue

Related topics