Segmentation fault while building an engine from an ONNX model

Description

Hi, while running trtexec with an ONNX model, I’ve got a segmentation fault.
Upon the log, segmentation fault occurred on Timing Runner for Myelin-fused foreign node.

[08/28/2023-12:01:48] [V] [TRT] --------------- Timing Runner: /features/0/0/Conv (CaskFlattenConvolution)
[08/28/2023-12:01:48] [V] [TRT] CaskFlattenConvolution has no valid tactics for this config, skipping
[08/28/2023-12:01:48] [V] [TRT] >>>>>>>>>>>>>>> Chose Runner Type: CaskConvolution Tactic: 0x7121ec1db3f80c67
[08/28/2023-12:01:48] [V] [TRT] =============== Computing costs for 
[08/28/2023-12:01:48] [V] [TRT] *************** Autotuning format combination: Float(401408,3136,56,1) -> Float(1000,1) ***************
[08/28/2023-12:01:48] [V] [TRT] --------------- Timing Runner: {ForeignNode[/features/0/2/Constant_output_0...(Unnamed Layer* 1783) [ElementWise]]} (Myelin)
Segmentation fault (core dumped)

As the internal code for building an engine is not open-sourced, I’m struggling to debug this problem.
Attaching the input ONNX model and the log file.
ONNX model link
simple_swin_b.engine.build.log (1.1 MB)

Environment

TensorRT Version: 8.5.2-1+cuda11.8
GPU Type: RTX A6000
Nvidia Driver Version: 510.108.03
CUDA Version: 12.0
CUDNN Version: 8
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt 23.01-py3

Steps To Reproduce

  1. Build trtexec on the above docker image
  2. trtexec --onnx=/workspace/models/simple_swin_b.onnx --saveEngine=simple_swin_b.engine --buildOnly --verbose

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Hi, Thank you for the reply.

I attached the ONNX model and the log is from --verbose.
And it passed the ONNX checker as well.
Is there information about what the “Timing Runner” is doing exactly and how to debug the error?
It seems it’s almost impossible to debug it from user because the code is not open.
Any suggestion will be a great help!

Thank you

Hi,

We recommend you use the latest TensorRT version 8.6.1 - nvcr.io/nvidia/tensorrt:23.07-py3
Using the latest TensorRT version, we could successfully build the TRT engine.

[08/29/2023-06:22:58] [I] Engine deserialized in 0.103842 sec.
[08/29/2023-06:22:58] [I] Skipped inference phase since --skipInference is added.
&&&& PASSED TensorRT.trtexec [TensorRT v8601] # trtexec --onnx=simple_swin_b.onnx --buildOnly --verbose

Thank you.

Hi,
As your suggestion, it worked with the latest TRT version.
Thanks a lot!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.