Covert model Whisper to ONNX

Description

I am working to convert model Whisper (small) to TensorRT. My steps follow some steps below:

  1. Sperated 2 models from whisper: Encoder and Decoder.
  2. I convert 2 models into ONNX
  3. I use trtexec to transform ONNX to a .engine file.
    But I see that the decoder didn’t contain Beam search. So it can not return token_id directly.
    I want to make a full pipeline to run them by TensorRT. Please help me.

Environment

TensorRT Version: 8.6.1
GPU Type: T4
Nvidia Driver Version: 12
CUDA Version: 12
CUDNN Version:
Operating System + Version: Ubuntu 22.04
Python Version (if applicable): 3.10.10
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.0.1
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

In release/9.0, we have added support for Vision2Seq model (BLIP). The logic is very similar for Whisper to run beam search, etc. You will need to create an object similar to GenerationMixin in HuggingFace to run beam search, or you can implement the logic by yourself. TRT can generate from input_ids+encoder_hidden_states to logits. TensorRT/demo/HuggingFace/BLIP at release/9.1 · NVIDIA/TensorRT (github.com)