envirmonent:
hardware: rtx4090
Driver Version: 550.107.02
software: cuda release 12.4, V12.4.131
pip environment :
absl-py 2.1.0
accelerate 0.31.0
aenum 3.1.15
aiofiles 23.2.1
aiohappyeyeballs 2.4.0
aiohttp 3.10.5
aiohttp-sse-client 0.2.1
aiosignal 1.3.1
altair 5.4.1
annotated-types 0.7.0
anyio 4.4.0
async-timeout 4.0.3
attrs 24.2.0
build 1.2.1
certifi 2024.8.30
charset-normalizer 3.3.2
click 8.1.7
click-option-group 0.5.6
cloudpickle 3.0.0
colored 2.2.4
coloredlogs 15.0.1
contourpy 1.3.0
cuda-python 12.6.0
cycler 0.12.1
datasets 2.14.5
diffusers 0.30.2
dill 0.3.7
distro 1.9.0
einops 0.7.0
evaluate 0.4.1
exceptiongroup 1.2.2
fastapi 0.112.2
ffmpy 0.4.0
filelock 3.15.4
flash-attn 2.5.8
flatbuffers 24.3.25
fonttools 4.53.1
frozenlist 1.4.1
fsspec 2023.6.0
gradio 4.36.0
gradio_client 1.0.1
h11 0.14.0
h5py 3.10.0
httpcore 1.0.5
httpx 0.27.2
huggingface-hub 0.24.6
humanfriendly 10.0
idna 3.8
importlib_metadata 8.4.0
importlib_resources 6.4.4
janus 1.0.0
Jinja2 3.1.4
jiter 0.5.0
joblib 1.4.2
jsonschema 4.23.0
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
lark 1.2.2
latex2mathml 3.77.0
Markdown 3.7
markdown-it-py 3.0.0
MarkupSafe 2.1.5
matplotlib 3.9.2
mdtex2html 1.3.0
mdurl 0.1.2
mpi4py 4.0.0
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.15
narwhals 1.6.0
networkx 3.3
ninja 1.11.1.1
nltk 3.9.1
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-modelopt 0.15.1
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.6.68
nvidia-nvtx-cu12 12.1.105
onnx 1.16.2
onnx-simplifier 0.4.36
onnxruntime-gpu 1.19.2
openai 1.39.0
optimum 1.22.0
orjson 3.10.7
packaging 24.1
pandas 2.2.2
pillow 10.3.0
pip 22.0.2
polygraphy 0.49.9
protobuf 5.28.0
psutil 6.0.0
PuLP 2.9.0
pyarrow 17.0.0
pyarrow-hotfix 0.6
pydantic 2.9.0b2
pydantic_core 2.23.1
pydub 0.25.1
Pygments 2.18.0
pynvml 11.5.3
pyparsing 3.1.4
pyproject_hooks 1.1.0
python-dateutil 2.9.0.post0
python-multipart 0.0.9
pytz 2024.1
PyYAML 6.0.2
referencing 0.35.1
regex 2024.7.24
requests 2.32.3
responses 0.18.0
rich 13.8.0
rouge-score 0.1.2
rpds-py 0.20.0
ruff 0.6.3
safetensors 0.4.4
scipy 1.14.1
semantic-version 2.10.0
sentencepiece 0.1.99
setuptools 59.6.0
shellingham 1.5.4
six 1.16.0
sniffio 1.3.1
sse-starlette 2.1.3
starlette 0.38.4
StrEnum 0.4.15
sympy 1.13.2
tensorrt 10.3.0
tensorrt-cu12 10.3.0
tensorrt-cu12-bindings 10.3.0
tensorrt-cu12-libs 10.3.0
tensorrt-llm 0.13.0.dev2024081300
tiktoken 0.6.0
timm 1.0.9
tokenizers 0.19.1
tomli 2.0.1
tomlkit 0.12.0
torch 2.4.0
torchao 0.5.0
torchvision 0.19.0
tqdm 4.66.5
transformers 4.41.2
transformers-stream-generator 0.0.5
triton 3.0.0
typer 0.12.5
typing_extensions 4.12.2
tzdata 2024.1
urllib3 2.2.2
uvicorn 0.30.6
websockets 11.0.3
wheel 0.37.1
xxhash 3.5.0
yarl 1.9.7
zipp 3.20.1
error describe:
When I quantify the Phi3-min-128k model, I use two commands
一、Command 1:
python3 …/TensorRT-LLM/examples/quantization/quantize.py --model_dir ./Phi-3-mini-128k-instruct/ --output_dir ./phi_out/ --dtype float16 --qformat fp8 --kv_cache_dtype fp8
****** Terminal output:Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by promote_options=‘default’.
table = cls._concat_blocks(blocks, axis=0)
Inserted 387 quantizers
/usr/local/lib/python3.10/dist-packages/modelopt/torch/quantization/model_quant.py:131: DeprecationWarning: forward_loop should take model as argument, but got forward_loop without any arguments. This usage will be deprecated in future versions.
return calibrate(model, config[“algorithm”], forward_loop=forward_loop)
[10/10/2024-10:11:33] You are not running the flash-attention implementation, expect numerical differences.
current rank: 0, tp rank: 0, pp rank: 0
/usr/lib/python3.10/tempfile.py:1008: ResourceWarning: Implicitly cleaning up <TemporaryDirectory ‘/tmp/tmp481ehvj0’>
_warnings.warn(warn_message, ResourceWarning)
二、Command 2: trtllm-build --checkpoint_dir ./phi_out/ --output_dir ./phi_engine/ --gemm_plugin auto --max_batch_size 8 --max_input_len 1024 --max_seq_len 2048
****** Terminal output:
File “/usr/local/lib/python3.10/dist-packages/tensorrt_llm/functional.py”, line 1223, in slice
input_ndim = input.ndim()
AttributeError: ‘NoneType’ object has no attribute ‘ndim’
how to solve this error ?