Description
The inference time of the TRT engine is incredibly high
Environment
TensorRT Version: TensorRT v8.2
GPU Type: NVIDIA A10G
Nvidia Driver Version: 515.65.01
CUDA Version: 11.7
CUDNN Version:
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.8
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
report-nsys (78.0 KB)
The model I have produces these statistics for inference, but there seem to be 2 operations taking ~98% of the time, i.e. TensorRT:ExecutionContext::enqueue and TensorRT:ConvTranspose_114. The inference time is of about 1000ms which is incredibly higher then the pytorch inference.