Description
A clear and concise description of the bug or issue.
Environment
TensorRT Version: 8.5.3.1
GPU Type: Nvidia GeForce GTX 1080 Ti
Nvidia Driver Version: 555.42.02
CUDA Version: 12.5
pyCuda Version: (2022, 2, 2)
Operating System + Version: Ubuntu 20.04
Python Version: 3.8.10
PyTorch Version: 2.3.1+cu121
Docker Container: nvcr.io/nvidia/tensorrt:23.03-py3
I have generated an engine file from an onnx format which originally was in PyTorch. The engine model is a semantic segmentation model. I am trying to run inference on the engine file but the error occurs persistently despite debugging.
[07/09/2024-09:38:32] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::42] Error Code 1: Cuda Runtime (invalid argument) Segmentation fault (core dumped)
The code I’m using for inference on TensorRt is as follows:
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
# Set up TensorRT logger and runtime
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
runtime = trt.Runtime(TRT_LOGGER)
# Load the TensorRT engine from the file
engine_file_path = 'det_model.engine' # Replace with the path to your TensorRT engine file
with open(engine_file_path, 'rb') as f:
engine = runtime.deserialize_cuda_engine(f.read())
# Create a context for executing the engine
context = engine.create_execution_context()
# Get the number of bindings (input and output tensors)
num_bindings = engine.num_bindings
# Allocate memory for input and output tensors
host_inputs = []
cuda_inputs = []
host_outputs = []
cuda_outputs = []
bindings = [None] * num_bindings # Preallocate bindings list with the correct size
for i in range(num_bindings):
binding_shape = engine.get_tensor_shape(engine.get_binding_name(i))
dtype = trt.nptype(engine.get_tensor_dtype(engine.get_binding_name(i)))
size = trt.volume(binding_shape)
if engine.binding_is_input(i):
host_input = np.zeros(size, dtype=dtype)
cuda_input = cuda.mem_alloc(host_input.nbytes)
host_inputs.append(host_input)
cuda_inputs.append(cuda_input)
bindings[i] = int(cuda_input)
print(f"Input Binding {i} - Shape: {binding_shape}, Size: {size}, Dtype: {dtype}")
else:
host_output = np.zeros(size, dtype=dtype)
cuda_output = cuda.mem_alloc(host_output.nbytes)
host_outputs.append(host_output)
cuda_outputs.append(cuda_output)
bindings[i] = int(cuda_output)
print(f"Output Binding {i} - Shape: {binding_shape}, Size: {size}, Dtype: {dtype}")
# Create a dummy input tensor with the correct shape
dummy_input = np.random.rand(*engine.get_tensor_shape(engine.get_binding_name(0))).astype(np.float32)
print(f"Dummy Input Shape: {dummy_input.shape}")
# Copy the dummy input to the device memory
cuda.memcpy_htod(cuda_inputs[0], dummy_input)
# Execute the engine
print("Executing the engine...")
context.execute_v2(bindings=bindings)
print("Execution completed.")
# Copy the output from the device memory to the host memory
for i in range(num_bindings):
if not engine.binding_is_input(i):
cuda.memcpy_dtoh(host_outputs[i - len(host_inputs)], cuda_outputs[i - len(host_inputs)]) # Adjust index for host_outputs and cuda_outputs
print(f"Output {i}: {host_outputs[i - len(host_inputs)]}")
# Deallocate memory for input and output tensors
for cuda_input in cuda_inputs:
cuda_input.free()
for cuda_output in cuda_outputs:
cuda_output.free()
The following is detailed error:
dummyinference.py:29: DeprecationWarning: Use get_tensor_name instead.
binding_shape = engine.get_tensor_shape(engine.get_binding_name(i))
dummyinference.py:30: DeprecationWarning: Use get_tensor_name instead.
dtype = trt.nptype(engine.get_tensor_dtype(engine.get_binding_name(i)))
dummyinference.py:32: DeprecationWarning: Use get_tensor_mode instead.
if engine.binding_is_input(i):
Input Binding 0 - Shape: (1, 360, 640), Size: 230400, Dtype: <class 'numpy.float32'>
Output Binding 1 - Shape: (1, 180, 320), Size: 57600, Dtype: <class 'numpy.float32'>
dummyinference.py:48: DeprecationWarning: Use get_tensor_name instead.
dummy_input = np.random.rand(*engine.get_tensor_shape(engine.get_binding_name(0))).astype(np.float32)
Dummy Input Shape: (1, 360, 640)
Executing the engine...
Execution completed.
dummyinference.py:61: DeprecationWarning: Use get_tensor_mode instead.
if not engine.binding_is_input(i):
Output 1: [4.0739775e-05 0.0000000e+00 0.0000000e+00 ... 4.8309565e-05 1.7109513e-04
1.0510683e-03]
[07/09/2024-09:38:32] [TRT] [E] 1: [defaultAllocator.cpp::deallocate::42] Error Code 1: Cuda Runtime (invalid argument)
Segmentation fault (core dumped)
I would really appreciate, if someone can help me resolve this problem.