Get error message as conver qwen to int-gptq in tensorrt-llm for agx orin

Itโ€™s good news that jetson can use tensorrt-llm as following link TensorRT-LLM ๐Ÿ†• - NVIDIA Jetson AI Lab.

but I met problem as run TensorRT-LLM on agx orin

I have tried tensorrt-llm contain and wheel installation.
tensorrt-llm can not work in contain as the incompleted tensorrt installed.
and tried wheel of tensorrt_llm 0.12.0 for jetson , it display โ€œKeyError: โ€˜model.layers.0.self_attn.q_proj.qweightโ€ as convert qwen2.5 model , please refer to following words.

#run convert command
python3 /home/chat/TensorRT-LLM/examples/qwen/convert_checkpoint.py
โ€“model_dir /home/chat/Downloads/qwen-7b
โ€“output_dir /home/chat/Downloads/tllm_checkpoint_1gpu_gptq
โ€“dtype float16
โ€“use_weight_only
โ€“weight_only_precision int4_gptq
โ€“per_group

#display message as run command
[TensorRT-LLM] TensorRT-LLM version: 0.12.0
0.12.0
Loading checkpoint shards: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 4/4 [00:07<00:00, 1.94s/it]
loading weight in each layerโ€ฆ: 0%| | 0/28 [00:00<?, ?it/s]
Traceback (most recent call last):
File โ€œ/home/chat/TensorRT-LLM/examples/qwen/convert_checkpoint.pyโ€, line 308, in
main()
File โ€œ/home/chat/TensorRT-LLM/examples/qwen/convert_checkpoint.pyโ€, line 300, in main
convert_and_save_hf(args)
File โ€œ/home/chat/TensorRT-LLM/examples/qwen/convert_checkpoint.pyโ€, line 256, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File โ€œ/home/chat/TensorRT-LLM/examples/qwen/convert_checkpoint.pyโ€, line 263, in execute
f(args, rank)
File โ€œ/home/chat/TensorRT-LLM/examples/qwen/convert_checkpoint.pyโ€, line 246, in convert_and_save_rank
qwen = QWenForCausalLM.from_hugging_face(
File โ€œ/home/chat/.venv/lib/python3.10/site-packages/tensorrt_llm/models/qwen/model.pyโ€, line 313, in from_hugging_face
weights = load_weights_from_hf_gptq_model(hf_model, config)
File โ€œ/home/chat/.venv/lib/python3.10/site-packages/tensorrt_llm/models/qwen/convert.pyโ€, line 1365, in load_weights_from_hf_gptq_model
comp_part = model_params[prefix + key_list[0] + comp + suf]
KeyError: โ€˜model.layers.0.self_attn.q_proj.qweightโ€™

We dont have TRT -LLM release available for Devzone release.
Please see Can Drive Orin support TensorRT-LLM? - #2 by SivaRamaKrishnaNV

This forum is exclusively for developers who are part of the NVIDIA DRIVEยฎ AGX SDK Developer Program | NVIDIA Developer To post in the forum, please use an account associated with your corporate or university email address.
This helps us ensure that the forum remains a platform for verified members of the developer program.

hi ,thank a lot for your reply.
and please check other ticket mentioned tensorrt-llm on agx orin.

this might resolve your problem.

The MaziyarPanahi/Meta-Llama-3-8B-Instruct-GPTQ repo has requirments and this is the only package not in tensorrt-llm

git clone GitHub - AutoGPTQ/AutoGPTQ: An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

If you arenโ€™t using conda, edit setup.py modify this line to this conda_cuda_include_dir = โ€œ/usr/local/cuda/includeโ€

export BUILD_CUDA_EXT=1
export TORCH_CUDA_ARCH_LIST=โ€œ8.7โ€
export COMPILE_MARLIN=1
MAX_JOBS=10 python -m pip wheel . --no-build-isolation -w dist --no-clean
pip install dist/auto_gptq-0.8.0.dev0+cu126-cp310-cp310-linux_aarch64.whl --user

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.