Description
I have a model that was successfully converted with trtexec
with TensorRT 7.1.3.
After updating to TensorRT 8.5.2 the conversion process starts to fail with a segmentation fault.
After two days of debugging, I finally managed to create a minimal example that triggers the segmentation fault.
If you remove any of the layers, the model will convert successfully.
Code to obtain the minimal ONNX model:
import os
import torch.nn
import torch.nn.functional as F
class ModelSegFault(torch.nn.Module):
def __call__(self, x):
x = F.softmax(x, 2)
x = torch.cat(x.split(1, 2), 2)
x = x.reshape(6400)
_, topk_inds = x.topk(1000, 0)
x = x[topk_inds].reshape(1, 1000)
return x
model = ModelSegFault()
inp = torch.rand((1, 800, 8), dtype=torch.float32)
path = os.path.expanduser('~/bug.onnx')
torch.onnx.export(
model, inp, path, input_names=['input'], output_names=['output']
)
We found that the segfault can be avoided by adding an epsilon value to the output of softmax:
class ModelSegFault(torch.nn.Module):
def __call__(self, x):
x = F.softmax(x, 2) + 1e-5
x -= 1e-5
x = torch.cat(x.split(1, 2), 2)
x = x.reshape(6400)
_, topk_inds = x.topk(1000, 0)
x = x[topk_inds].reshape(1, 1000)
return x
Environment
We have tested it on a few computers, including Jetson Orin.
TensorRT Version: 8.5.2
GPU Type: Nvidia GeForce Titan X (Maxwell) / 2080Ti / Jetson Orin NX
Nvidia Driver Version: 510.85.02-0ubuntu0.20.04.1 / ? / ?
CUDA Version: 10.2.89-1 / 11.5 / 11.4.19
CUDNN Version: 8.3.1.22-1+cuda10.2 / ? / 8.6.0
Operating System + Version: Ubuntu 20.04 / Ubuntu 22.04 / NVIDIA Jetson Linux 35.4.1
Python Version (if applicable): 3.8 / - / -
PyTorch Version (if applicable): 1.10.2 / - / -
Baremetal or Container (if container which image + tag): Baremetal
Relevant Files
model.onnx (558 Bytes)