GPU memory leak in nemo asr_model.transcribe()

Hello, I’m getting a memory leak issue when trying to use the ASR model inference using asr_model.transcribe()

You can reproduce this issue using this notebook: Google Colab

As you can see in the image the GPU memory is increasing over time.

You can take any random WAV audio file and make sure you run it for 40 min to 1 hour. The Memory error should reproduce like mine.

Hi @mehadi.hasan ,
This doesnt look like a cudnn issue.
Maybe CUDA or Nemo Forums could be the right platform.