Description
I am working to convert model Whisper (small) to TensorRT. My steps follow some steps below:
- Sperated 2 models from whisper: Encoder and Decoder.
- I convert 2 models into ONNX
- I use trtexec to transform ONNX to a .engine file.
But I see that the decoder didn’t contain Beam search. So it can not return token_id directly.
I want to make a full pipeline to run them by TensorRT. Please help me.
Environment
TensorRT Version: 8.6.1
GPU Type: T4
Nvidia Driver Version: 12
CUDA Version: 12
CUDNN Version:
Operating System + Version: Ubuntu 22.04
Python Version (if applicable): 3.10.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.0.1
Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
- Exact steps/commands to build your repro
- Exact steps/commands to run your repro
- Full traceback of errors encountered