Performance Bottleneck in TensorRT Inference on Jetson with Semantic Segmentation Model (DWConv)

user3618 · January 11, 2024, 3:52pm

Description

I am currently working on deploying a semantic segmentation model from PyTorch to TensorRT for inference on Jetson. After exporting the model to .onnx and optimizing it using TensorRT, I have encountered a performance bottleneck in a specific block of my model’s backbone. There are two instances of this block, and together they consume a total of 8 ms during inference.
The total arch (574 tensorrt Layers) average latency is 76ms it means that this 2 blocks alone take 21% of the time.

Has anyone any hint?

Environment

TensorRT Version: 7.1.3
GPU Type: Jetson Xavier AGX
Nvidia Driver Version:
CUDA Version: 10.2.89
CUDNN Version: 8.0.0.180
Operating System + Version: Ubuntu 18.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.13.1
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/lt4t-ml:r32.7.1-py3

Relevant Files

This is the block with releated I/O shapes:

AakankshaS · January 31, 2024, 9:11am

Hi @user3618 ,
Apologies for delayed response, can you please help us with model and repro script.

Topic		Replies	Views
TensorRT Inconsistent Inference Performance with Python and Trtexec TensorRT tensorrt , cuda , jetson-inference , python , cudnn	0	288	April 2, 2024
BIggest Latency in TensorRT TensorRT cudnn	1	303	October 19, 2023
TensorRT inference slower than PyTorch, different tactics are being selected TensorRT tensorrt	1	1097	November 27, 2023
TensorRT inference time extremely slow TensorRT	1	443	January 31, 2023
Segmentation fault when running inference of Bert example on jetson orin nano TensorRT tensorrt , cudnn , jetson	1	15	November 30, 2024
Tensorrt on Jetson Nano TensorRT tensorrt , cudnn , jetson	1	22	September 30, 2024
Tensorrt is slower than pytorch TensorRT	2	2184	September 15, 2021
TensorRT Optimization for Tensorflow-Unet-Image-segmentation TensorRT tensorrt , tensorflow , nano	1	1154	August 4, 2021
Performance discrepancy using TensorRT engines TensorRT tensorrt	3	654	October 5, 2021
TensorRT inference significantly slower using kernel size 11 vs 3 TensorRT tensorrt	3	506	February 10, 2021

Performance Bottleneck in TensorRT Inference on Jetson with Semantic Segmentation Model (DWConv)

Description

Environment

Relevant Files

Related topics