Segmentation fault when running build_serialized_network or deserialize_cuda_engine for both trt and onnx

Description

A clear and concise description of the bug or issue.
Using build_serialized_network or deserialize_cuda_engine throws segmentation fault for both onnx and trt models

Environment

**TensorRT Version8.6.1:
**GPU TypeRTX 3080:
**Nvidia Driver Version545:
**CUDA Version12.3:
**Operating System + VersionUbuntu 20 x86:
**Python Version (if applicable)3.8:

Program

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np

#import pdb; pdb.set_trace()

print(“Tensorrt version:”, trt.version)

onnx_model_path = ‘/home/ws/getting_started_v5.0.0/notebooks/tao_launcher_starter_kit/detectnet_v2/detectnet_v2/experiment_dir_final/resnet18_detector.onnx’
#engine_file_path = ‘/home/ws/engine2.trt’

builder = trt.Builder(trt.Logger(trt.Logger.INFO))

max_batch_size = 1
builder.max_batch_size = max_batch_size

platform_has_tf32 = builder.platform_has_tf32
print(“Platform has TF32 support:”, platform_has_tf32)

platform_has_fast_fp16 = builder.platform_has_fast_fp16
print(“Platform has fast native FP16 support:”, platform_has_fast_fp16)

platform_has_fast_int8 = builder.platform_has_fast_int8
print(“Platform has fast native INT8 support:”, platform_has_fast_int8)

network = builder.create_network()
with trt.OnnxParser(network, builder.logger) as parser:
with open(onnx_model_path, ‘rb’) as model_file:
model_data = model_file.read()
parser.parse(model_data)

config = builder.create_builder_config()

try:
serialized_engine = builder.build_serialized_network(network, config)
print(“Network built and serialized successfully!”)
except Exception as e:
print(“Error during network building:”, e)
raise

runtime = trt.Runtime(trt.Logger(trt.Logger.INFO))
engine = runtime.deserialize_cuda_engine(serialized_engine)

bindings = [cuda.mem_alloc(size) for size in [trt.volume(shape) * max_batch_size for shape in network.get_binding_shape(0)]]

stream = cuda.Stream()

input_data = np.random.randn(trt.volume(network.get_binding_shape(0)) * max_batch_size).astype(np.float32)

cuda.memcpy_htod_async(bindings[0], input_data, stream)

with engine.create_execution_context() as context:
context.execute_async(bindings=[int(b) for b in bindings], stream_handle=stream.handle)

output_data = np.empty(trt.volume(network.get_binding_shape(1)) * max_batch_size, dtype=np.float32)
cuda.memcpy_dtoh_async(output_data, bindings[1], stream)

stream.synchronize()

print(“Inference completed successfully!”)

Debugging

[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.
[New Thread 0x7fffa7b07700 (LWP 50490)]
[New Thread 0x7fffa7306700 (LWP 50491)]
[New Thread 0x7fffa4b05700 (LWP 50492)]
[New Thread 0x7fffa2304700 (LWP 50493)]
[New Thread 0x7fff9fb03700 (LWP 50494)]
[New Thread 0x7fff9d302700 (LWP 50495)]
[New Thread 0x7fff98b01700 (LWP 50496)]
[New Thread 0x7fff96300700 (LWP 50497)]
[New Thread 0x7fff95aff700 (LWP 50498)]
[New Thread 0x7fff912fe700 (LWP 50499)]
[New Thread 0x7fff8eafd700 (LWP 50500)]
[New Thread 0x7fff8c2fc700 (LWP 50501)]
[New Thread 0x7fff89afb700 (LWP 50502)]
[New Thread 0x7fff872fa700 (LWP 50503)]
[New Thread 0x7fff86af9700 (LWP 50504)]
[New Thread 0x7fff842f8700 (LWP 50505)]
[New Thread 0x7fff83af7700 (LWP 50506)]
[New Thread 0x7fff7f2f6700 (LWP 50507)]
[New Thread 0x7fff7caf5700 (LWP 50508)]
[New Thread 0x7fff782f4700 (LWP 50509)]
[New Thread 0x7fff75af3700 (LWP 50510)]
[New Thread 0x7fff752f2700 (LWP 50511)]
[New Thread 0x7fff70af1700 (LWP 50512)]
[New Thread 0x7fff67174700 (LWP 50513)]
[New Thread 0x7fff65647700 (LWP 50517)]
[New Thread 0x7fff64cc3700 (LWP 50519)]
Tensorrt version: 8.6.1
[02/06/2024-18:55:47] [TRT] [I] [MemUsageChange] Init CUDA: CPU +563, GPU +0, now: CPU 593, GPU 1092 (MiB)
[02/06/2024-18:55:48] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +433, GPU +102, now: CPU 1045, GPU 1186 (MiB)
test.py:19: DeprecationWarning: Use network created with NetworkDefinitionCreationFlag::EXPLICIT_BATCH flag instead.
builder.max_batch_size = max_batch_size
Platform has TF32 support: True
Platform has fast native FP16 support: True
Platform has fast native INT8 support: True
[02/06/2024-18:55:48] [TRT] [W] The implicit batch dimension mode has been deprecated. Please create the network with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag whenever possible.

Thread 1 “python3” received signal SIGSEGV, Segmentation fault.
0x00007fffda4fe328 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1

Any idea on how to fix this?

@AakankshaS
@AastaLLL