Hello Community,
TensorRT: 10.3.0
NVIDIA GPU: NVIDIA Jetson Orin Nano 8GB - AVerMedia 131S SuperL4T
Nvidia driver version: L4T R36.4.4
cuDNN Version: 9.3.0.75
Operating System: Ubuntu 22.04.5 LTS
Python Version: Python 3.10.12
I am trying to run a computer vision model based on segmentation on a Jetson Orin nano industrial kit attached with 4 Realsense Cameras. I am using the TensorRT Execution provider with an input of 1088Hx1920W at FP16. While running the image segmentation & detection project at the location overnight, I observe that the time taken for detections is increasing linearly (see attached graph). The total detection time was 1.2s at the beginning & it rose to 4.0s after 16 hours of operation (3567 detections). What is causing this to happen & what system properties of the Jetson board do I need to keep track off to investigate this further? Also, are there any changes taht can be implemented to keep the detection time at 1.2s ?
Hi,
Have you tried to run the model with TensorRT directly?
To check whether the issue comes from TensorRT or ONNXRuntime, could you try to deploy the model with trtexec and update the behavior with us?
Thanks.
Hello Aastha LLL,
I ran the model with TensorRT directly, by using the .engine file compiled on that specific Jetson device.
Hi,
Do you run it with /usr/src/tensorrt/bin/trtexec?
Thanks
No. How can I do that ?
Thsi is how I load teh tenstoRT engine file in my vision project:
def create_tensorRT_session(self, engine_path):
engine_path = str(engine_path)
if not os.path.isfile(engine_path):
raise FileNotFoundError(f"TensorRT engine not found: {engine_path}")
self.trt_logger = trt.Logger(trt.Logger.WARNING)
self.trt_runtime = trt.Runtime(self.trt_logger)
with open(engine_path, "rb") as f:
engine_data = f.read()
self.engine = self.trt_runtime.deserialize_cuda_engine(engine_data)
if self.engine is None:
raise RuntimeError(f"Failed to deserialize TensorRT engine: {engine_path}")
self.context = self.engine.create_execution_context()
if self.context is None:
raise RuntimeError("Failed to create TensorRT execution context")
self.input_name = None
self.output_name = None
for i in range(self.engine.num_io_tensors):
name = self.engine.get_tensor_name(i)
mode = self.engine.get_tensor_mode(name)
if mode == trt.TensorIOMode.INPUT:
self.input_name = name
elif mode == trt.TensorIOMode.OUTPUT:
self.output_name = name
if self.input_name is None or self.output_name is None:
raise RuntimeError("Could not find TensorRT input/output tensors")
\# Fixed-shape engine
self.context.set_input_shape(self.input_name, (1, 3, self.height, self.width))
self.input_shape = tuple(self.context.get_tensor_shape(self.input_name))
self.output_shape = tuple(self.context.get_tensor_shape(self.output_name))
ep_name = self.ep.upper() if self.ep else "ORT"
logging.info(f"\[{ep_name}\] input tensor: {self.input_name}, shape={self.input_shape}")
logging.info(f"\[{ep_name}\] output tensor: {self.output_name}, shape={self.output_shape}")
\# Host buffers
trt_in_dtype = trt.nptype(self.engine.get_tensor_dtype(self.input_name))
self.trt_input_host = np.empty(self.input_shape, dtype=trt_in_dtype)
trt_out_dtype = trt.nptype(self.engine.get_tensor_dtype(self.output_name))
self.trt_output_host = np.empty(self.output_shape, dtype=trt_out_dtype)
\# Device buffers
self.trt_input_device = cuda.mem_alloc(self.trt_input_host.nbytes)
self.trt_output_device = cuda.mem_alloc(self.trt_output_host.nbytes)
\# Bind addresses
self.context.set_tensor_address(self.input_name, int(self.trt_input_device))
self.context.set_tensor_address(self.output_name, int(self.trt_output_device))
self.stream = cuda.Stream()
self.providers = \["TensorRT_Engine"\]
\# unified inference hook
self.run_inference = self.run_trt_inference
Hi,
After the deserialization, you will get a TensorRT engine file.
Then de-serialize with the tool via the loadEngine argument.
Thanks.