Issue with TensorRT Whisper consuming inordinate amount of system memory which kills the process

Description

Following the example in the TensorRT LLM Github for Whisper:

I was able to build the TensorRT engine successfully following the example laid out.

I was able to successfully run the dummy dataset librispeech clean which is a copy in the Readme.md

However, when I ran the run.py for a single WAV file of 5 minutes, I saw the system memory increase until the max system memory of the environment of 12.7GB which then killed the process. Hence, no transcription is produced occurs.

I tried adding the --debug argument but no result was produced for debugging.

It appears that the TensorRT model example only deals with audio <= 30 seconds. Is there a way to extend the model to transcribe more than 30 seconds?

Step to reproduce:

  1. Go to the google colab link.
  2. Execute the cells in order.
  3. One should see the model crash when running an audio file >30 seconds and use all system memory.

Environment

TensorRT Version: 9.3.0.post12.dev1
GPU Type: T4 Google Colab
Nvidia Driver Version: 535.104.05
CUDA Version: 12.2
CUDNN Version: nvidia-cudnn-cu12 8.9.2.26
Operating System + Version: Google Colab
Python Version (if applicable): 3.10.12
TensorFlow Version (if applicable): NONE
PyTorch Version (if applicable): 2.2.1+cu121
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered