TensorRT-LLM on Jetson Orin NX(16GB)

Has anyone tried using TensorRT-LLM on Jetson Orin NX (16GB)? I keep encountering the issue “core dump” when using trtllm-build, even with a small model (0.5B). The official tests were conducted on AGX Orin.

Here are some suggestions for the common issues:

1. Performance

Please run the below command before benchmarking deep learning use case:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

2. Installation

Installation guide of deep learning frameworks on Jetson:

3. Tutorial

Startup deep learning tutorial:

4. Report issue

If these suggestions don’t help and you want to report an issue to us, please attach the model, command/step, and the customized app (if any) with us to reproduce locally.



Could you share how you setup the environment?
Are you following the instructions shared in the below topic:


Thanks a lot for your reply. Previously, I set up the TensorRT-LLM environment based on the link TensorRT-LLM Deployment on Jetson Orin.

Now I am using the pre-configured container. And I have successfully conducted inference on a small-scale model(1.5b) using the Tensor-LLM framework.

However, I still encounter issues when attampting to convert a 7b model. The model file is downloaded from qwen2.5-7b. My conversion command is:

python3 convert_checkpoint.py --model_dir /home/models/qwen/qwen-2.5_7b --output_dir /home/models/engines/qwen2.5_7b_ckpt --dtype=float16 --use_weight_only --weight_only_precision int8

The error message received is as follows:

/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
[TensorRT-LLM] TensorRT-LLM version: 0.12.0
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████| 4/4 [00:11<00:00,  2.86s/it]
[01/05/2025-14:08:12] Some parameters are on the meta device because they were offloaded to the cpu.
Traceback (most recent call last):
  File "/home/learn/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 308, in <module>
  File "/home/learn/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 300, in main
  File "/home/learn/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 256, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/home/learn/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 263, in execute
    f(args, rank)
  File "/home/learn/TensorRT-LLM/examples/qwen/convert_checkpoint.py", line 246, in convert_and_save_rank
    qwen = QWenForCausalLM.from_hugging_face(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/model.py", line 315, in from_hugging_face
    weights = load_weights_from_hf_model(hf_model, config)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 1221, in load_weights_from_hf_model
    weights = convert_hf_qwen(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 823, in convert_hf_qwen
    get_tllm_linear_weight(qkv_w, tllm_prex + 'attention.qkv.',
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/qwen/convert.py", line 502, in get_tllm_linear_weight
    v.cpu(), plugin_weight_only_quant_type)
NotImplementedError: Cannot copy out of meta tensor; no data!


The 7B model cannot work on a 16GB device.
Please try the model with weight <= 4B.


Thanks a lot.

Does “weight <= 4B” refers to the numbers of parmeters in the model? The 1.5B model actually works on Jetson Orin NX(16GB). However, when I try a 3B model, the conversion
process still fails. The model was downloaded from 通义千问2.5-3B-Instruct · 模型库.

The conversion command is:

python3 convert_checkpoint.py --model_dir Qwen2.5-3B-Instruct --output_dir qwen2.5_3b_ckpt --dtype float16

The error message is:

/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
[TensorRT-LLM] TensorRT-LLM version: 0.12.0
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.86s/it]
Weights loaded. Total time: 00:00:05
[1]    1132242 killed     python3 convert_checkpoint.py --model_dir  --output_dir  --dtype float16

When converting the model Llama-3.2-1B-Instruct · 模型库, I encounter the following error:

/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
[TensorRT-LLM] TensorRT-LLM version: 0.12.0
117it [00:00, 177.11it/s]
Traceback (most recent call last):
  File "/home/learn/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 487, in <module>
  File "/home/learn/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 479, in main
  File "/home/learn/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 421, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/home/learn/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 428, in execute
    f(args, rank)
  File "/home/learn/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 410, in convert_and_save_rank
    llama = LLaMAForCausalLM.from_hugging_face(
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 363, in from_hugging_face
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 329, in generate_tllm_weights
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/model_weights_loader.py", line 271, in load
    v = sub_module.postprocess(tllm_key, v, **postprocess_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/linear.py", line 380, in postprocess
    weights = weights.to(str_dtype_to_torch(self.dtype))
AttributeError: 'NoneType' object has no attribute 'to'
Exception ignored in: <function PretrainedModel.__del__ at 0xfffef0f240d0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 449, in __del__
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 446, in release
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/_utils.py", line 469, in release_gc
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 966, in ipc_collect
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 338, in _lazy_init
    raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: 'NoneType' object is not iterable

CUDA call was originally invoked at:

  File "/home/learn/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 8, in <module>
    from transformers import AutoConfig
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/transformers/__init__.py", line 26, in <module>
    from . import dependency_versions_check
  File "<frozen importlib._bootstrap>", line 1078, in _handle_fromlist
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/transformers/dependency_versions_check.py", line 16, in <module>
    from .utils.versions import require_version, require_version_core
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/__init__.py", line 27, in <module>
    from .chat_template_utils import DocstringParsingException, TypeHintParsingException, get_json_schema
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/chat_template_utils.py", line 39, in <module>
    from torch import Tensor
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 1955, in <module>
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 1539, in <module>
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 261, in _lazy_call
    _queued_calls.append((callable, traceback.format_stack()))

My runtime environment is a pre-configured container. And my conversion command is:

python3 convert_checkpoint.py --model_dir Llama-3.2-1B-Instruct --output_dir llama3.2_1b_ckpt --dtype float16