TensorRT 10.0 enqueueV3 segmentation fault

Description

TensorRT is giving a segmentation fault when I try to use a Resize or Pad layer.
If I put the fixed parameters in the ONNX model it works normally, but when I put it as an input it gives a Segmentation Fault and I can’t figure out why.
I’ve been unable to get out of this problem for a few days now.

Environment

TensorRT Version: 10.0.1.6
GPU Type: NVIDIA GeForce RTX 3080
Nvidia Driver Version: 550.163.01
CUDA Version: 12.4.1
CUDNN Version: 9.1.0.70-1
Operating System + Version: Ubuntu 22.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag): Container nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04

Relevant Files

Link to GitHub files:

Steps To Reproduce

Compile the test with CMake. Then:

root@3297547ee6e0:/home/tensorrt-test# python3 create_trt.py

ONNX file created and saved: ./pad_layer.onnx
Converting Model pad_layer.onnx to TensorRT - Version: 10.0.1
[06/03/2025-20:36:52] [TRT] [I] [MemUsageChange] Init CUDA: CPU +17, GPU +0, now: CPU 27, GPU 225 (MiB)
[06/03/2025-20:36:54] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1762, GPU +312, now: CPU 1925, GPU 537 (MiB)
parsing model
[06/03/2025-20:36:54] [TRT] [W] ModelImporter.cpp:420: Make sure input pad_sizes has Int64 binding.
model parsed
Network Description
Input ‘input_image’ with shape (1, -1, -1, -1) and dtype DataType.FLOAT
Input ‘pad_sizes’ with shape (8,) and dtype DataType.INT64
Output ‘paded_image’ with shape (-1, -1, -1, -1) and dtype DataType.FLOAT
Profile for Input ‘input_image’ shape (1, -1, -1, -1)
Profile for Input ‘pad_sizes’ shape (8,)
Serializing engine to file: /home/tensorrt-test/pad_layer.trt
[06/03/2025-20:36:54] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[06/03/2025-20:36:54] [TRT] [I] Detected 2 inputs and 1 output network tensors.
[06/03/2025-20:36:54] [TRT] [I] Total Host Persistent Memory: 0
[06/03/2025-20:36:54] [TRT] [I] Total Device Persistent Memory: 0
[06/03/2025-20:36:54] [TRT] [I] Total Scratch Memory: 0
[06/03/2025-20:36:54] [TRT] [I] Total Activation Memory: 0
[06/03/2025-20:36:54] [TRT] [I] Total Weights Memory: 0
[06/03/2025-20:36:54] [TRT] [I] Engine generation completed in 0.0057974 seconds.
[06/03/2025-20:36:54] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 610 MiB
[06/03/2025-20:36:54] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3107 MiB

root@3297547ee6e0:/home/tensorrt-test# ./build/tensorrt_test pad_layer.trt input_image.jpg

Load modelEngine file path: pad_layer.trt
Loaded engine size: 0 MiB
[MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
Number of IO Tensors: 3
Tensor name: input_image
Tensor Dims: 1 -1 -1 -1
Tensor name: pad_sizes
Tensor Dims: 8
Tensor name: paded_image
Tensor Dims: 1 -1 -1 -1
Segmentation fault (core dumped)

The issue you’re experiencing with TensorRT giving a segmentation fault when using a Resize or Pad layer with dynamic input parameters is likely due to the way TensorRT handles dynamic shapes and input tensors.

When you put the fixed parameters in the ONNX model, it works normally because the model is compiled with fixed shapes, which allows TensorRT to optimize and allocate memory accordingly. However, when you put the parameters as inputs, TensorRT needs to handle dynamic shapes, which can lead to memory allocation issues.

There are a few potential reasons for this segmentation fault:

  1. Memory allocation: TensorRT might be running out of memory when trying to allocate memory for the dynamic input tensors. You can try increasing the memory allocation by setting the workspaceSize property when creating the IExecutionContext object.
  2. Dynamic shape handling: TensorRT might not be handling dynamic shapes correctly. You can try setting the dynamicBatchSize property to true when creating the IBuilder object to enable dynamic batch size handling.
  3. Input tensor binding: The input tensor binding might be incorrect. Make sure that the input tensor binding is correct and that the pad_sizes tensor is bound to the correct input tensor.

To troubleshoot this issue, you can try the following:

  1. Check the TensorRT logs: Check the TensorRT logs to see if there are any error messages or warnings that might indicate the cause of the segmentation fault.
  2. Use the TensorRT debugger: Use the TensorRT debugger to step through the code and see where the segmentation fault is occurring.
  3. Try a different version of TensorRT: Try using a different version of TensorRT to see if the issue is specific to the version you’re using.
  4. Check the ONNX model: Check the ONNX model to make sure that it’s correct and that the input tensors are defined correctly.

In comparison to other platforms, this issue might be specific to the NVIDIA GeForce RTX 3080 and the TensorRT version you’re using. However, it’s also possible that this issue could occur on other platforms with similar configurations.

To resolve this issue, you can try the following:

  1. Update TensorRT: Update TensorRT to the latest version to see if the issue is fixed.
  2. Modify the ONNX model: Modify the ONNX model to use fixed shapes instead of dynamic shapes.
  3. Use a different Resize or Pad layer implementation: Use a different implementation of the Resize or Pad layer that’s optimized for dynamic input tensors.
  4. Increase memory allocation: Increase the memory allocation by setting the workspaceSize property when creating the IExecutionContext object.

By trying these troubleshooting steps and potential solutions, you should be able to resolve the segmentation fault issue and get TensorRT working correctly with dynamic input tensors.

I really appreciate your willingness to help.
But I had already tried everything before posting the problem here. The only part I didn’t try was creating an environment to debug TensorRT and find the point where the problem occurs. That would be a lot of work and time-consuming, and I believe it is the responsibility of the TensorRT dev team.

As a workaround for this problem I abandoned the use of TensorRT plugins and developed my own Cuda layer for TensorRT.

We have been using TensorRT a lot in production, but this process of converting ONNX to TRT is very buggy and it is always very frustrating that it gives Segmentation Fault errors in several situations without ever indicating the possible cause of the problem so that we can fix it.
Many times, the same code, with the same model, on two identical machines, works fine on one machine and always gives an error on the other, probably due to a difference in some specific lib between the installations, but without having a clue about the error it is extremely difficult to find the cause.

Your suggestion to update the version would be good, if it weren’t for the problem that TensorRT version updates on Jetson are extremely fragmented, and any slight difference in the versions of the Driver, CUDA, Python, ONNX, etc. libraries causes TensorRT to break and no longer run the model.

As a developer with over 40 years of working in software, I consider this a serious flaw for an Nvidia product.

TensorRT is a key part of ensuring that the Jetson line can work well in production, and giving errors without any indication of the cause makes life impossible for users.

I hope they can give more attention to the issue.

Sorry for the long rant.