Description
Hi there,
i want to convert an static quantized transformer model to trt. It is an CodeGenForCausalLM from transformers library. I used the following command for converting the model.
trtexec --onnx=trt/decoder_model_quantized.onnx --int8 --minShapes=input_ids:1x1,attention_mask:1x1 --maxShapes=input_ids:1x512,attention_mask:1x512 --saveEngine=model-quantized.onnx.plan --device=0 --allowGPUFallback --useCudaGraph
The error is as follows:
03/22/2023-14:01:53] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[03/22/2023-14:02:03] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are:
[03/22/2023-14:02:03] [W] [TRT] (- 0 (CAST_F_TO_I (FLOOR (DIV_F (MUL_ADD_F -1 (CAST_I_TO_F sequence_length) 0) 1))))
[03/22/2023-14:02:03] [W] [TRT] sequence_length
[03/22/2023-14:02:04] [W] [TRT] Skipping tactic 0x0000000000000000 due to Myelin error: [canonicalize_axis] Operation /transformer/ln_f/Constant_1_output_0_QuantizeLinear has out of range axis value 0.
[03/22/2023-14:02:04] [E] Error[10]: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[lm_head.bias.../Cast]}.)
[03/22/2023-14:02:04] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[03/22/2023-14:02:04] [E] Engine could not be created from network
[03/22/2023-14:02:04] [E] Building engine failed
[03/22/2023-14:02:04] [E] Failed to create engine from model or file.
[03/22/2023-14:02:04] [E] Engine set up failed
Environment
Docker Image: nvcr.io/nvidia/pytorch:22.12-py3
TensorRT Version 8.501
Python 3.8.10
Pips:
transformers 4.26.1
optimum 1.7.1
onnx 1.12.0
onnxruntime-gpu 1.14.1
pytorch-quantization 2.1.2
pytorch-triton 2.0.0+b8b470bc59
torch 1.13.1
torch-tensorrt 1.3.0
torchtext 0.13.0a0+fae8e8c
torchvision 0.15.0a0
Steps To Reproduce
- Convert the model to transformer model to onnx via optimum-cli
- Do the quantization (we did it exactly) like in quantization guide
- Run infer_shape.py as suggested by trtexec.
- Run the cmd from above.
Thank you very much for any help!