Error Code 1: Cuda Runtime (invalid argument) Segmentation fault (core dumped)

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version: 8.5.3.1
GPU Type: Nvidia GeForce GTX 1080 Ti
Nvidia Driver Version: 555.42.02
CUDA Version: 12.5
pyCuda Version: (2022, 2, 2)
Operating System + Version: Ubuntu 20.04
Python Version: 3.8.10
PyTorch Version: 2.3.1+cu121
Docker Container: nvcr.io/nvidia/tensorrt:23.03-py3

I have generated an engine file from an onnx format which originally was in PyTorch. The engine model is a semantic segmentation model. I am trying to run inference on the engine file but the error occurs persistently despite debugging.
[07/09/2024-09:38:32] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::42] Error Code 1: Cuda Runtime (invalid argument) Segmentation fault (core dumped)

The code I’m using for inference on TensorRt is as follows:

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np

# Set up TensorRT logger and runtime
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
runtime = trt.Runtime(TRT_LOGGER)

# Load the TensorRT engine from the file
engine_file_path = 'det_model.engine'  # Replace with the path to your TensorRT engine file
with open(engine_file_path, 'rb') as f:
    engine = runtime.deserialize_cuda_engine(f.read())

# Create a context for executing the engine
context = engine.create_execution_context()

# Get the number of bindings (input and output tensors)
num_bindings = engine.num_bindings

# Allocate memory for input and output tensors
host_inputs = []
cuda_inputs = []
host_outputs = []
cuda_outputs = []
bindings = [None] * num_bindings  # Preallocate bindings list with the correct size

for i in range(num_bindings):
    binding_shape = engine.get_tensor_shape(engine.get_binding_name(i))
    dtype = trt.nptype(engine.get_tensor_dtype(engine.get_binding_name(i)))
    size = trt.volume(binding_shape)
    if engine.binding_is_input(i):
        host_input = np.zeros(size, dtype=dtype)
        cuda_input = cuda.mem_alloc(host_input.nbytes)
        host_inputs.append(host_input)
        cuda_inputs.append(cuda_input)
        bindings[i] = int(cuda_input)
        print(f"Input Binding {i} - Shape: {binding_shape}, Size: {size}, Dtype: {dtype}")
    else:
        host_output = np.zeros(size, dtype=dtype)
        cuda_output = cuda.mem_alloc(host_output.nbytes)
        host_outputs.append(host_output)
        cuda_outputs.append(cuda_output)
        bindings[i] = int(cuda_output)
        print(f"Output Binding {i} - Shape: {binding_shape}, Size: {size}, Dtype: {dtype}")

# Create a dummy input tensor with the correct shape
dummy_input = np.random.rand(*engine.get_tensor_shape(engine.get_binding_name(0))).astype(np.float32)
print(f"Dummy Input Shape: {dummy_input.shape}")

# Copy the dummy input to the device memory
cuda.memcpy_htod(cuda_inputs[0], dummy_input)

# Execute the engine
print("Executing the engine...")
context.execute_v2(bindings=bindings)
print("Execution completed.")

# Copy the output from the device memory to the host memory
for i in range(num_bindings):
    if not engine.binding_is_input(i):
        cuda.memcpy_dtoh(host_outputs[i - len(host_inputs)], cuda_outputs[i - len(host_inputs)])  # Adjust index for host_outputs and cuda_outputs
        print(f"Output {i}: {host_outputs[i - len(host_inputs)]}")

# Deallocate memory for input and output tensors
for cuda_input in cuda_inputs:
    cuda_input.free()

for cuda_output in cuda_outputs:
    cuda_output.free()

The following is detailed error:

dummyinference.py:29: DeprecationWarning: Use get_tensor_name instead.
  binding_shape = engine.get_tensor_shape(engine.get_binding_name(i))
dummyinference.py:30: DeprecationWarning: Use get_tensor_name instead.
  dtype = trt.nptype(engine.get_tensor_dtype(engine.get_binding_name(i)))
dummyinference.py:32: DeprecationWarning: Use get_tensor_mode instead.
  if engine.binding_is_input(i):
Input Binding 0 - Shape: (1, 360, 640), Size: 230400, Dtype: <class 'numpy.float32'>
Output Binding 1 - Shape: (1, 180, 320), Size: 57600, Dtype: <class 'numpy.float32'>
dummyinference.py:48: DeprecationWarning: Use get_tensor_name instead.
  dummy_input = np.random.rand(*engine.get_tensor_shape(engine.get_binding_name(0))).astype(np.float32)
Dummy Input Shape: (1, 360, 640)
Executing the engine...
Execution completed.
dummyinference.py:61: DeprecationWarning: Use get_tensor_mode instead.
  if not engine.binding_is_input(i):
Output 1: [4.0739775e-05 0.0000000e+00 0.0000000e+00 ... 4.8309565e-05 1.7109513e-04
 1.0510683e-03]
[07/09/2024-09:38:32] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::42] Error Code 1: Cuda Runtime (invalid argument)
Segmentation fault (core dumped)

I would really appreciate, if someone can help me resolve this problem.

Hi @muhammad.fasih1 ,
Can you please help us with your model and repro steps.
Thanks