Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[lm_head.bias.../Cast]}.)

Description

Hi there,

i want to convert an static quantized transformer model to trt. It is an CodeGenForCausalLM from transformers library. I used the following command for converting the model.

 trtexec --onnx=trt/decoder_model_quantized.onnx --int8 --minShapes=input_ids:1x1,attention_mask:1x1 --maxShapes=input_ids:1x512,attention_mask:1x512 --saveEngine=model-quantized.onnx.plan --device=0 --allowGPUFallback --useCudaGraph

The error is as follows:

03/22/2023-14:01:53] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[03/22/2023-14:02:03] [W] [TRT] Myelin graph with multiple dynamic values may have poor performance if they differ. Dynamic values are: 
[03/22/2023-14:02:03] [W] [TRT]  (- 0 (CAST_F_TO_I (FLOOR (DIV_F (MUL_ADD_F -1 (CAST_I_TO_F sequence_length) 0) 1))))
[03/22/2023-14:02:03] [W] [TRT]  sequence_length
[03/22/2023-14:02:04] [W] [TRT] Skipping tactic 0x0000000000000000 due to Myelin error: [canonicalize_axis] Operation /transformer/ln_f/Constant_1_output_0_QuantizeLinear has out of range axis value 0.
[03/22/2023-14:02:04] [E] Error[10]: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[lm_head.bias.../Cast]}.)
[03/22/2023-14:02:04] [E] Error[2]: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
[03/22/2023-14:02:04] [E] Engine could not be created from network
[03/22/2023-14:02:04] [E] Building engine failed
[03/22/2023-14:02:04] [E] Failed to create engine from model or file.
[03/22/2023-14:02:04] [E] Engine set up failed

Environment

Docker Image: nvcr.io/nvidia/pytorch:22.12-py3
TensorRT Version 8.501
Python 3.8.10
Pips:
transformers 4.26.1
optimum 1.7.1
onnx 1.12.0
onnxruntime-gpu 1.14.1
pytorch-quantization 2.1.2
pytorch-triton 2.0.0+b8b470bc59
torch 1.13.1
torch-tensorrt 1.3.0
torchtext 0.13.0a0+fae8e8c
torchvision 0.15.0a0

Steps To Reproduce

  1. Convert the model to transformer model to onnx via optimum-cli
  2. Do the quantization (we did it exactly) like in quantization guide
  3. Run infer_shape.py as suggested by trtexec.
  4. Run the cmd from above.

Thank you very much for any help!

Hi ,
We recommend you to check the supported features from the below link.

You can refer below link for all the supported operators list.
For unsupported operators, you need to create a custom plugin to support the operation

Thanks!

Thank you for your reply.

It seems that lm_head.bias.../Cast is not in the list of supported files, right?