Tensorrt-llm Phi-3-mini-128k-instruct error

1017948396 · October 11, 2024, 1:41am

envirmonent:
hardware: rtx4090
Driver Version: 550.107.02
software: cuda release 12.4, V12.4.131
pip environment :
absl-py 2.1.0
accelerate 0.31.0
aenum 3.1.15
aiofiles 23.2.1
aiohappyeyeballs 2.4.0
aiohttp 3.10.5
aiohttp-sse-client 0.2.1
aiosignal 1.3.1
altair 5.4.1
annotated-types 0.7.0
anyio 4.4.0
async-timeout 4.0.3
attrs 24.2.0
build 1.2.1
certifi 2024.8.30
charset-normalizer 3.3.2
click 8.1.7
click-option-group 0.5.6
cloudpickle 3.0.0
colored 2.2.4
coloredlogs 15.0.1
contourpy 1.3.0
cuda-python 12.6.0
cycler 0.12.1
datasets 2.14.5
diffusers 0.30.2
dill 0.3.7
distro 1.9.0
einops 0.7.0
evaluate 0.4.1
exceptiongroup 1.2.2
fastapi 0.112.2
ffmpy 0.4.0
filelock 3.15.4
flash-attn 2.5.8
flatbuffers 24.3.25
fonttools 4.53.1
frozenlist 1.4.1
fsspec 2023.6.0
gradio 4.36.0
gradio_client 1.0.1
h11 0.14.0
h5py 3.10.0
httpcore 1.0.5
httpx 0.27.2
huggingface-hub 0.24.6
humanfriendly 10.0
idna 3.8
importlib_metadata 8.4.0
importlib_resources 6.4.4
janus 1.0.0
Jinja2 3.1.4
jiter 0.5.0
joblib 1.4.2
jsonschema 4.23.0
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
lark 1.2.2
latex2mathml 3.77.0
Markdown 3.7
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.9.2
mdtex2html 1.3.0
mdurl 0.1.2
mpi4py 4.0.0
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.15
narwhals 1.6.0
networkx 3.3
ninja 1.11.1.1
nltk 3.9.1
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-modelopt 0.15.1
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.6.68
nvidia-nvtx-cu12 12.1.105
onnx 1.16.2
onnx-simplifier 0.4.36
onnxruntime-gpu 1.19.2
openai 1.39.0
optimum 1.22.0
orjson 3.10.7
packaging 24.1
pandas 2.2.2
pillow 10.3.0
pip 22.0.2
polygraphy 0.49.9
protobuf 5.28.0
psutil 6.0.0
PuLP 2.9.0
pyarrow 17.0.0
pyarrow-hotfix 0.6
pydantic 2.9.0b2
pydantic_core 2.23.1
pydub 0.25.1
Pygments 2.18.0
pynvml 11.5.3
pyparsing 3.1.4
pyproject_hooks 1.1.0
python-dateutil 2.9.0.post0
python-multipart 0.0.9
pytz 2024.1
PyYAML 6.0.2
referencing 0.35.1
regex 2024.7.24
requests 2.32.3
responses 0.18.0
rich 13.8.0
rouge-score 0.1.2
rpds-py 0.20.0
ruff 0.6.3
safetensors 0.4.4
scipy 1.14.1
semantic-version 2.10.0
sentencepiece 0.1.99
setuptools 59.6.0
shellingham 1.5.4
six 1.16.0
sniffio 1.3.1
sse-starlette 2.1.3
starlette 0.38.4
StrEnum 0.4.15
sympy 1.13.2
tensorrt 10.3.0
tensorrt-cu12 10.3.0
tensorrt-cu12-bindings 10.3.0
tensorrt-cu12-libs 10.3.0
tensorrt-llm 0.13.0.dev2024081300
tiktoken 0.6.0
timm 1.0.9
tokenizers 0.19.1
tomli 2.0.1
tomlkit 0.12.0
torch 2.4.0
torchao 0.5.0
torchvision 0.19.0
tqdm 4.66.5
transformers 4.41.2
transformers-stream-generator 0.0.5
triton 3.0.0
typer 0.12.5
typing_extensions 4.12.2
tzdata 2024.1
urllib3 2.2.2
uvicorn 0.30.6
websockets 11.0.3
wheel 0.37.1
xxhash 3.5.0
yarl 1.9.7
zipp 3.20.1
error describe：

When I quantify the Phi3-min-128k model, I use two commands
一、Command 1:
python3 …/TensorRT-LLM/examples/quantization/quantize.py --model_dir ./Phi-3-mini-128k-instruct/ --output_dir ./phi_out/ --dtype float16 --qformat fp8 --kv_cache_dtype fp8
****** Terminal output:Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by promote_options=‘default’.
table = cls._concat_blocks(blocks, axis=0)
Inserted 387 quantizers
/usr/local/lib/python3.10/dist-packages/modelopt/torch/quantization/model_quant.py:131: DeprecationWarning: forward_loop should take model as argument, but got forward_loop without any arguments. This usage will be deprecated in future versions.
return calibrate(model, config[“algorithm”], forward_loop=forward_loop)
[10/10/2024-10:11:33] You are not running the flash-attention implementation, expect numerical differences.
current rank: 0, tp rank: 0, pp rank: 0
/usr/lib/python3.10/tempfile.py:1008: ResourceWarning: Implicitly cleaning up <TemporaryDirectory ‘/tmp/tmp481ehvj0’>
_warnings.warn(warn_message, ResourceWarning)

二、Command 2: trtllm-build --checkpoint_dir ./phi_out/ --output_dir ./phi_engine/ --gemm_plugin auto --max_batch_size 8 --max_input_len 1024 --max_seq_len 2048

****** Terminal output:
File “/usr/local/lib/python3.10/dist-packages/tensorrt_llm/functional.py”, line 1223, in slice
input_ndim = input.ndim()
AttributeError: ‘NoneType’ object has no attribute ‘ndim’

how to solve this error ?

AakankshaS · November 30, 2024, 11:21am

Hi @1017948396 ,
REquest you to raise GitHub · Where software is built

Topic		Replies	Views
Convert tensorrt engine from version 7 to 8 TAO Toolkit tensorrt	67	4369	October 12, 2021
AttributeError: 'NoneType' object has no attribute 'create_execution_context' TensorRT	30	22113	June 17, 2023
Turbocharging Meta Llama 3 Performance with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server Technical Blog	62	3586	August 28, 2024
Tensorrt8 & wsl2 issues TensorRT tensorrt , wsl	3	2377	January 7, 2022
No CUDA-capable device is detected TAO Toolkit cuda , tao	9	54	February 17, 2025
ERROR: No matching distribution found for tensorrt_llm==0.9.0 TensorRT llama	0	70	February 5, 2025
Yolov3 not working TensorRT	3	557	June 14, 2021
VehicleTypeNet cannot output results in the process of inference TAO Toolkit tao	5	805	April 16, 2023
Getting error, RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED while running a basic RNN model TensorRT pytorch	3	19344	April 17, 2023
About build errors for sampleOnnxMNIST TensorRT tensorrt , cuda	3	997	February 4, 2021

Tensorrt-llm Phi-3-mini-128k-instruct error

Related topics