Convert wav2vec2 (ONNX) using trtexec raise error : free(): double free detected in tcache 2 Aborted (core dumped)

Description

I converted wav2vec2 to ONNX and now I want convert to tensorrt (TRT) using trtexec command. First I tried Nvidia TensorRT container (nvcr.io/nvidia/tensorrt:21.11-py3) which works correctly and convert model successfully.

Then I tried to convert ONNX model to trt on my local machine. I installed CUDA, CUDNN and TensorRT packages using .deb local repos with same version in tensorrt:21.11-py3 container like this :

Environment

TensorRT Version: 8.0.3-1+cuda11.3
GPU Type: NVIDIA GeForce GTX 1650 Ti
Nvidia Driver Version: 495.29.05
CUDA Version: 11.5 [11.3 and 11.4 installed also]
CUDNN Version: 8.3.1.22-1+cuda11.5
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.8.12
TensorFlow Version (if applicable): 2.7.0
PyTorch Version (if applicable): 1.10.0
Baremetal or Container (if container which image + tag): -

Relevant Files

wav2vec2 hugging face model

Steps To Reproduce

1. Convert Pytorch to ONNX

First convert wav2vec2 (Pytorch) to ONNX using these lines:

import torch
import os
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
device = torch.device('cuda')

model_path = "facebook/wav2vec2-large-960h-lv60-self"

processor = Wav2Vec2Processor.from_pretrained(model_path)
model = Wav2Vec2ForCTC.from_pretrained(model_path).to(device)

dummy_input = torch.rand([1,3600]).to(device)
input_names = ["input"]
output_names = ["output"]

torch.onnx.export(model, dummy_input, "wav2vec2.onnx", verbose=True, input_names=input_names, output_names=output_names)

2. Use trtexec to Convert ONNX to TensorRT

Now use trtexec command to convert ONNX model to TensorRT :

$ trtexec --onnx=wav2vec2.onnx --saveEngine=test.trt 

3. Traceback

Traceback which is shown to me is this :

$ trtexec --onnx=wav2vec2.onnx --saveEngine=test.trt
&&&& RUNNING TensorRT.trtexec [TensorRT v8003] # trtexec --onnx=wav2vec2.onnx --saveEngine=test.trt
[12/12/2021-12:56:40] [I] === Model Options ===
[12/12/2021-12:56:40] [I] Format: ONNX
[12/12/2021-12:56:40] [I] Model: wav2vec2.onnx
[12/12/2021-12:56:40] [I] Output:
[12/12/2021-12:56:40] [I] === Build Options ===
[12/12/2021-12:56:40] [I] Max batch: explicit
[12/12/2021-12:56:40] [I] Workspace: 16 MiB
[12/12/2021-12:56:40] [I] minTiming: 1
[12/12/2021-12:56:40] [I] avgTiming: 8
[12/12/2021-12:56:40] [I] Precision: FP32
[12/12/2021-12:56:40] [I] Calibration: 
[12/12/2021-12:56:40] [I] Refit: Disabled
[12/12/2021-12:56:40] [I] Sparsity: Disabled
[12/12/2021-12:56:40] [I] Safe mode: Disabled
[12/12/2021-12:56:40] [I] Restricted mode: Disabled
[12/12/2021-12:56:40] [I] Save engine: test.trt
[12/12/2021-12:56:40] [I] Load engine: 
[12/12/2021-12:56:40] [I] NVTX verbosity: 0
[12/12/2021-12:56:40] [I] Tactic sources: Using default tactic sources
[12/12/2021-12:56:40] [I] timingCacheMode: local
[12/12/2021-12:56:40] [I] timingCacheFile: 
[12/12/2021-12:56:40] [I] Input(s)s format: fp32:CHW
[12/12/2021-12:56:40] [I] Output(s)s format: fp32:CHW
[12/12/2021-12:56:40] [I] Input build shapes: model
[12/12/2021-12:56:40] [I] Input calibration shapes: model
[12/12/2021-12:56:40] [I] === System Options ===
[12/12/2021-12:56:40] [I] Device: 0
[12/12/2021-12:56:40] [I] DLACore: 
[12/12/2021-12:56:40] [I] Plugins:
[12/12/2021-12:56:40] [I] === Inference Options ===
[12/12/2021-12:56:40] [I] Batch: Explicit
[12/12/2021-12:56:40] [I] Input inference shapes: model
[12/12/2021-12:56:40] [I] Iterations: 10
[12/12/2021-12:56:40] [I] Duration: 3s (+ 200ms warm up)
[12/12/2021-12:56:40] [I] Sleep time: 0ms
[12/12/2021-12:56:40] [I] Streams: 1
[12/12/2021-12:56:40] [I] ExposeDMA: Disabled
[12/12/2021-12:56:40] [I] Data transfers: Enabled
[12/12/2021-12:56:40] [I] Spin-wait: Disabled
[12/12/2021-12:56:40] [I] Multithreading: Disabled
[12/12/2021-12:56:40] [I] CUDA Graph: Disabled
[12/12/2021-12:56:40] [I] Separate profiling: Disabled
[12/12/2021-12:56:40] [I] Time Deserialize: Disabled
[12/12/2021-12:56:40] [I] Time Refit: Disabled
[12/12/2021-12:56:40] [I] Skip inference: Disabled
[12/12/2021-12:56:40] [I] Inputs:
[12/12/2021-12:56:40] [I] === Reporting Options ===
[12/12/2021-12:56:40] [I] Verbose: Disabled
[12/12/2021-12:56:40] [I] Averages: 10 inferences
[12/12/2021-12:56:40] [I] Percentile: 99
[12/12/2021-12:56:40] [I] Dump refittable layers:Disabled
[12/12/2021-12:56:40] [I] Dump output: Disabled
[12/12/2021-12:56:40] [I] Profile: Disabled
[12/12/2021-12:56:40] [I] Export timing to JSON file: 
[12/12/2021-12:56:40] [I] Export output to JSON file: 
[12/12/2021-12:56:40] [I] Export profile to JSON file: 
[12/12/2021-12:56:40] [I] 
[12/12/2021-12:56:40] [I] === Device Information ===
[12/12/2021-12:56:40] [I] Selected Device: NVIDIA GeForce GTX 1650 Ti
[12/12/2021-12:56:40] [I] Compute Capability: 7.5
[12/12/2021-12:56:40] [I] SMs: 16
[12/12/2021-12:56:40] [I] Compute Clock Rate: 1.485 GHz
[12/12/2021-12:56:40] [I] Device Global Memory: 3903 MiB
[12/12/2021-12:56:40] [I] Shared Memory per SM: 64 KiB
[12/12/2021-12:56:40] [I] Memory Bus Width: 128 bits (ECC disabled)
[12/12/2021-12:56:40] [I] Memory Clock Rate: 6.001 GHz
[12/12/2021-12:56:40] [I] 
[12/12/2021-12:56:40] [I] TensorRT version: 8003
[12/12/2021-12:56:41] [I] [TRT] [MemUsageChange] Init CUDA: CPU +330, GPU +0, now: CPU 338, GPU 623 (MiB)
[12/12/2021-12:56:41] [I] Start parsing network model
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1264813709
[12/12/2021-12:56:41] [I] [TRT] ----------------------------------------------------------------
[12/12/2021-12:56:41] [I] [TRT] Input filename:   wav2vec2.onnx
[12/12/2021-12:56:41] [I] [TRT] ONNX IR version:  0.0.7
[12/12/2021-12:56:41] [I] [TRT] Opset version:    9
[12/12/2021-12:56:41] [I] [TRT] Producer name:    pytorch
[12/12/2021-12:56:41] [I] [TRT] Producer version: 1.10
[12/12/2021-12:56:41] [I] [TRT] Domain:           
[12/12/2021-12:56:41] [I] [TRT] Model version:    0
[12/12/2021-12:56:41] [I] [TRT] Doc string:       
[12/12/2021-12:56:41] [I] [TRT] ----------------------------------------------------------------
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 1264813709
[12/12/2021-12:56:42] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[12/12/2021-12:56:44] [I] Finish parsing network model
[12/12/2021-12:56:44] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1551, GPU 632 (MiB)
[12/12/2021-12:56:44] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 1551 MiB, GPU 632 MiB
[12/12/2021-12:56:48] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +517, GPU +224, now: CPU 2071, GPU 857 (MiB)
[12/12/2021-12:56:48] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +115, GPU +52, now: CPU 2186, GPU 909 (MiB)
[12/12/2021-12:56:48] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[12/12/2021-12:57:19] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
free(): double free detected in tcache 2
Aborted (core dumped)

But in tensorrt:21.11-py3 container everything done correctly. Is there any suggestion to address this issues?

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
https://docs.nvidia.com/deeplearning/tensorrt/quick-start-guide/index.html#onnx-export

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Thanks for your response. I run trtexec --onnx=onnx_models/wav2vec2.onnx --verbose and this is last lines of terminal log:

[12/13/2021-18:29:21] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: CaskConvolution Tactic: -3263369460438823196
[12/13/2021-18:29:21] [V] [TRT] *************** Autotuning Reformat:Float(12288,12,12,1) -> Float(12288,1,12288,1024) ***************
[12/13/2021-18:29:21] [V] [TRT] --------------- Timing Runner: Optimizer Reformat (Reformat)
[12/13/2021-18:29:21] [V] [TRT] Tactic: 1002 Time: 0.007132
[12/13/2021-18:29:21] [V] [TRT] Tactic: 0 Time: 0.004288
[12/13/2021-18:29:21] [V] [TRT] Fastest Tactic: 0 Time: 0.004288
[12/13/2021-18:29:21] [V] [TRT] *************** Autotuning Reformat:Float(12288,12,12,1) -> Float(384,12:32,12,1) ***************
[12/13/2021-18:29:21] [V] [TRT] --------------- Timing Runner: Optimizer Reformat (Reformat)
[12/13/2021-18:29:21] [V] [TRT] Tactic: 1002 Time: 0.007036
[12/13/2021-18:29:21] [V] [TRT] Tactic: 0 Time: 0.004084
[12/13/2021-18:29:21] [V] [TRT] Fastest Tactic: 0 Time: 0.004084
[12/13/2021-18:29:21] [V] [TRT] *************** Autotuning Reformat:Float(12288,1,12288,1024) -> Float(12288,12,12,1) ***************
[12/13/2021-18:29:21] [V] [TRT] --------------- Timing Runner: Optimizer Reformat (Reformat)
[12/13/2021-18:29:21] [V] [TRT] Tactic: 1002 Time: 0.00684
[12/13/2021-18:29:21] [V] [TRT] Tactic: 0 Time: 0.004084
[12/13/2021-18:29:21] [V] [TRT] Fastest Tactic: 0 Time: 0.004084
[12/13/2021-18:29:21] [V] [TRT] *************** Autotuning Reformat:Float(12288,1,12288,1024) -> Float(384,12:32,12,1) ***************
[12/13/2021-18:29:21] [V] [TRT] --------------- Timing Runner: Optimizer Reformat (Reformat)
[12/13/2021-18:29:21] [V] [TRT] Tactic: 1002 Time: 0.007072
[12/13/2021-18:29:21] [V] [TRT] Tactic: 0 Time: 0.004192
[12/13/2021-18:29:21] [V] [TRT] Fastest Tactic: 0 Time: 0.004192
[12/13/2021-18:29:21] [V] [TRT] *************** Autotuning Reformat:Float(384,12:32,12,1) -> Float(12288,12,12,1) ***************
[12/13/2021-18:29:21] [V] [TRT] --------------- Timing Runner: Optimizer Reformat (Reformat)
[12/13/2021-18:29:21] [V] [TRT] Tactic: 1002 Time: 0.0069
[12/13/2021-18:29:21] [V] [TRT] Tactic: 0 Time: 0.004072
[12/13/2021-18:29:21] [V] [TRT] Fastest Tactic: 0 Time: 0.004072
[12/13/2021-18:29:21] [V] [TRT] *************** Autotuning Reformat:Float(384,12:32,12,1) -> Float(12288,1,12288,1024) ***************
[12/13/2021-18:29:21] [V] [TRT] --------------- Timing Runner: Optimizer Reformat (Reformat)
[12/13/2021-18:29:21] [V] [TRT] Tactic: 1002 Time: 0.00704
[12/13/2021-18:29:21] [V] [TRT] Tactic: 0 Time: 0.003836
[12/13/2021-18:29:21] [V] [TRT] Fastest Tactic: 0 Time: 0.003836
[12/13/2021-18:29:21] [V] [TRT] *************** Autotuning format combination: Float(12288,12,12,1) -> Float(12288,12,1,1) ***************
[12/13/2021-18:29:21] [V] [TRT] --------------- Timing Runner: shuffle_after_(Unnamed Layer* 266) [Convolution]_output (Shuffle)
[12/13/2021-18:29:21] [V] [TRT] Tactic: 0 Time: 0.00286
[12/13/2021-18:29:21] [V] [TRT] Tactic: 1 Time: 0.04788
[12/13/2021-18:29:21] [V] [TRT] Fastest Tactic: 0 Time: 0.00286
[12/13/2021-18:29:21] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Shuffle Tactic: 0
[12/13/2021-18:29:21] [V] [TRT] *************** Autotuning format combination: Float(12288,1,12288,1024) -> Float(12288,1,1024,1024) ***************
[12/13/2021-18:29:21] [V] [TRT] --------------- Timing Runner: shuffle_after_(Unnamed Layer* 266) [Convolution]_output (Shuffle)
[12/13/2021-18:29:21] [V] [TRT] Tactic: 0 Time: 0.004256
[12/13/2021-18:29:21] [V] [TRT] Tactic: 1 Time: 0.011472
[12/13/2021-18:29:21] [V] [TRT] Fastest Tactic: 0 Time: 0.004256
[12/13/2021-18:29:21] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Shuffle Tactic: 0
[12/13/2021-18:29:21] [V] [TRT] *************** Autotuning format combination: Float(384,12:32,12,1) -> Float(384,12:32,1,1) ***************
[12/13/2021-18:29:21] [V] [TRT] --------------- Timing Runner: shuffle_after_(Unnamed Layer* 266) [Convolution]_output (Shuffle)
[12/13/2021-18:29:21] [V] [TRT] Tactic: 0 Time: 0.003028
[12/13/2021-18:29:21] [V] [TRT] Tactic: 1 Time: 0.011636
[12/13/2021-18:29:21] [V] [TRT] Fastest Tactic: 0 Time: 0.003028
[12/13/2021-18:29:21] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: Shuffle Tactic: 0
[12/13/2021-18:29:21] [V] [TRT] *************** Autotuning Reformat:Float(12288,1,1024,1024) -> Float(12288,12,1,1) ***************
[12/13/2021-18:29:21] [V] [TRT] --------------- Timing Runner: Optimizer Reformat (Reformat)
[12/13/2021-18:29:21] [V] [TRT] Tactic: 1002 Time: 0.007
[12/13/2021-18:29:21] [V] [TRT] Tactic: 0 Time: 0.004072
[12/13/2021-18:29:21] [V] [TRT] Fastest Tactic: 0 Time: 0.004072
[12/13/2021-18:29:21] [V] [TRT] *************** Autotuning Reformat:Float(384,12:32,1,1) -> Float(12288,12,1,1) ***************
[12/13/2021-18:29:21] [V] [TRT] --------------- Timing Runner: Optimizer Reformat (Reformat)
[12/13/2021-18:29:21] [V] [TRT] Tactic: 1002 Time: 0.007244
[12/13/2021-18:29:21] [V] [TRT] Tactic: 0 Time: 0.004088
[12/13/2021-18:29:21] [V] [TRT] Fastest Tactic: 0 Time: 0.004088
[12/13/2021-18:29:21] [V] [TRT] *************** Autotuning Reformat:Float(1,1024,1) -> Float(11264,1024,1) ***************
[12/13/2021-18:29:21] [V] [TRT] *************** Autotuning Reformat:Float(11264:32,1024,1) -> Float(11264,1024,1) ***************
[12/13/2021-18:29:21] [V] [TRT] *************** Autotuning format combination: Float(12288,12,1,1), Float(11264,1024,1) -> Float(352,32,1) ***************
[12/13/2021-18:29:21] [V] [TRT] --------------- Timing Runner: {ForeignNode[(Unnamed Layer* 267) [Shuffle]...Add_1739]} (Myelin)
free(): double free detected in tcache 2
Aborted (core dumped)

I think this issue is related to my local machine trtexec because i tried polygraphy and converted my ONNX model to TensorRT successfully and the output of model was same tensorrt:21.11-py3 docker container converted model.