Itโs good news that jetson can use tensorrt-llm as following link TensorRT-LLM ๐ - NVIDIA Jetson AI Lab.
but I met problem as run TensorRT-LLM on agx orin
I have tried tensorrt-llm contain and wheel installation.
tensorrt-llm can not work in contain as the incompleted tensorrt installed.
and tried wheel of tensorrt_llm 0.12.0 for jetson , it display โKeyError: โmodel.layers.0.self_attn.q_proj.qweightโ as convert qwen2.5 model , please refer to following words.
#run convert command
python3 /home/chat/TensorRT-LLM/examples/qwen/convert_checkpoint.py
โmodel_dir /home/chat/Downloads/qwen-7b
โoutput_dir /home/chat/Downloads/tllm_checkpoint_1gpu_gptq
โdtype float16
โuse_weight_only
โweight_only_precision int4_gptq
โper_group
#display message as run command
[TensorRT-LLM] TensorRT-LLM version: 0.12.0
0.12.0
Loading checkpoint shards: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 4/4 [00:07<00:00, 1.94s/it]
loading weight in each layerโฆ: 0%| | 0/28 [00:00<?, ?it/s]
Traceback (most recent call last):
File โ/home/chat/TensorRT-LLM/examples/qwen/convert_checkpoint.pyโ, line 308, in
main()
File โ/home/chat/TensorRT-LLM/examples/qwen/convert_checkpoint.pyโ, line 300, in main
convert_and_save_hf(args)
File โ/home/chat/TensorRT-LLM/examples/qwen/convert_checkpoint.pyโ, line 256, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File โ/home/chat/TensorRT-LLM/examples/qwen/convert_checkpoint.pyโ, line 263, in execute
f(args, rank)
File โ/home/chat/TensorRT-LLM/examples/qwen/convert_checkpoint.pyโ, line 246, in convert_and_save_rank
qwen = QWenForCausalLM.from_hugging_face(
File โ/home/chat/.venv/lib/python3.10/site-packages/tensorrt_llm/models/qwen/model.pyโ, line 313, in from_hugging_face
weights = load_weights_from_hf_gptq_model(hf_model, config)
File โ/home/chat/.venv/lib/python3.10/site-packages/tensorrt_llm/models/qwen/convert.pyโ, line 1365, in load_weights_from_hf_gptq_model
comp_part = model_params[prefix + key_list[0] + comp + suf]
KeyError: โmodel.layers.0.self_attn.q_proj.qweightโ