Better inference performance with PyTorch than with TensorRT

pruizb · October 21, 2022, 9:00am

Hello,

I am running an inference with ResNet50 using TensorRT on Python with Jetpack version 5.0.2-b231 on Jetson AGX Xavier. I am processing a variable number of detections to extract features so that the engine has been generated with dynamic batch from an ONNX model with variable input and output. The problem is that using dynamic batch makes the process much slower using TensorRT than using the original PyTorch model. You can find the original model here: GitHub - HobbitLong/SupContrast: PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)

I would like to know if there is any way to get higher performance using dynamic shape with TensorRT.

Thanks, Paula.

AastaLLL · October 24, 2022, 2:47am

Hi,

Could you share performance data you got with TensorRT and PyTorch?
And the detailed steps to reproduce the score so we can check it in our environment as well.

Thanks.

pruizb · October 24, 2022, 9:39am

Hello,

The inference time with PyTorch is about 63 ms and with TensorRT is about 686 ms per frame
Every frame has about 10 detections, so that I have created the engine with minShapes, maxShapes and optShapes parameters:

/usr/src/tensorrt/bin/./trtexec --onnx=/path/supcon_batch_variable.onnx --saveEngine=/path/supcon_batch_variable_fp32_opt8.trt --workspace=6000 --minShapes=input:1x3x224x224 --maxShapes=input:16x3x224x224 --optShapes=input:8x3x224x224

Also, I have created the onnx model using dynamic axes:

batch_size = 1
dummy_input = torch.randn(batch_size, 3, 224, 224)
torch.onnx.export(model, dummy_input, “/home/path/supcon_batch_variable.onnx”,
input_names=[‘input’], # the model’s input names
output_names=[‘output’], # the model’s output names
dynamic_axes={‘input’ : {0 : ‘batch_size’}, # variable length axes
‘output’ : {0 : ‘batch_size’}})

Finally, the inference with PyTorch is carried out as in the repository attached in the link: GitHub - HobbitLong/SupContrast: PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally). While with TensorRT I have created a context using Python.

AastaLLL · October 25, 2022, 9:28am

Hi,

We meet some issues when converting the TensorRT engine:

...
[10/25/2022-09:25:38] [I] [TRT] ----------------------------------------------------------------
[10/25/2022-09:25:38] [E] [TRT] ModelImporter.cpp:773: While parsing node number 49 [Conv -> "onnx::Relu_510"]:
[10/25/2022-09:25:38] [E] [TRT] ModelImporter.cpp:774: --- Begin node ---
[10/25/2022-09:25:38] [E] [TRT] ModelImporter.cpp:775: input: "input.4"
input: "onnx::Conv_511"
input: "onnx::Conv_512"
output: "onnx::Relu_510"
name: "Conv_49"
op_type: "Conv"
attribute {
  name: "dilations"
  ints: 1
  ints: 1
  type: INTS
}
attribute {
  name: "group"
  i: 1
  type: INT
}
attribute {
  name: "kernel_shape"
  ints: 1
  ints: 1
  type: INTS
}
attribute {
  name: "pads"
  ints: 0
  ints: 0
  ints: 0
  ints: 0
  type: INTS
}
attribute {
  name: "strides"
  ints: 1
  ints: 1
  type: INTS
}

[10/25/2022-09:25:38] [E] [TRT] ModelImporter.cpp:776: --- End node ---
[10/25/2022-09:25:38] [E] [TRT] ModelImporter.cpp:778: ERROR: ModelImporter.cpp:163 In function parseGraph:
[6] Invalid Node - Conv_49
The bias tensor is required to be an initializer for the Conv operator. Try applying constant folding on the model using Polygraphy: https://github.com/NVIDIA/TensorRT/tree/master/tools/Polygraphy/examples/cli/surgeon/02_folding_constants
[10/25/2022-09:25:38] [E] Failed to parse onnx file
[10/25/2022-09:25:38] [I] Finish parsing network model
[10/25/2022-09:25:38] [E] Parsing model failed
[10/25/2022-09:25:38] [E] Failed to create engine from model or file.
[10/25/2022-09:25:38] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8401] # /usr/src/tensorrt/bin/trtexec --onnx=./supcon_batch_variable.onnx --saveEngine=./supcon_batch_variable_fp32_opt8.trt --workspace=6000 --minShapes=input:1x3x224x224 --maxShapes=input:16x3x224x224 --optShapes=input:8x3x224x224

Below is our convert script, could you please check if any difference from yours?

from resnet_big import SupConResNet
import torch

model = SupConResNet(name='resnet50')

batch_size = 1
dummy_input = torch.randn(batch_size, 3, 224, 224)
torch.onnx.export(model, dummy_input, './supcon_batch_variable.onnx',
  input_names=['input'],
  output_names=['output'],
  dynamic_axes={'input' : {0 : 'batch_size'}, 'output': {0 : 'batch_size'}})

Thanks.

pruizb · October 25, 2022, 9:39am

Hello,

I find some differences between your code and mine.

from utils import resnet_big

import torch
batch_size = 1
dummy_input = torch.randn(batch_size, 3, 224, 224)
state_dict = torch.load(‘./supcon.pth’)[‘model’]

model = resnet_big.SupConResNet()
from collections import OrderedDict

new_state_dict = OrderedDict()
for k, v in state_dict.items():
name = k[7:] # remove ‘module.’ of dataparallel
new_state_dict[name] = v

model.load_state_dict(new_state_dict)

I don’t know if this code solves your problem. Is the ONNX model correctly transformed?

Thanks.

AastaLLL · October 26, 2022, 5:57am

Hi,

Did you use the pre-trained model shared on the repository?
We test it but the weight is not compatible and met the same error as below:

github.com/HobbitLong/SupContrast

Provided pre-trained load issue

opened 07:22PM - 19 Jul 22 UTC

Rasoul77

I tried to use the provided pre-trained PTH file presented in the Update section… using the following code snippet, but it returns errors of missing keys, ``` import torch from networks.resnet_big import SupConResNet device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') print(f"device to use: {device}") model = SupConResNet() model.to(device=device) model.load_state_dict(torch.load("./supcon.pth", map_location=device)) model.eval() ``` which results, ``` Error(s) in loading state_dict for SupConResNet: Missing key(s) in state_dict: "encoder.conv1.weight", "encoder.bn1.weight", ... ``` Could you clarify for which model the pre-trained "supcon.pth" is provided?

Could you share the network architecture or the model if you trained it on your own with us?
Thanks.

pruizb · October 26, 2022, 7:01am

Hello,

Yes, I am using the model shared on the repository. You must follow exactly the same steps that I showed in my previous post, loading the model using the repository network definition.

I shared with you the ONNX model in a zipped folder and I think you can convert it to TensorRT directly.

Thanks.
supcon_batch_variable.zip (99.3 MB)

pruizb · November 2, 2022, 1:39pm

Hello,

Is there any news about this topic?

Thanks.

AastaLLL · November 3, 2022, 5:04am

Hi,

Thanks for your patience.
Could you verify if the PyTorch inference is also using batch size=8?

We test the ONNX model with TensorRT and ONNXRuntime on Xavier.

In TensorRT, we got 84.6757ms for batchsize=1 and 652.653ms for batchsize=8.
In ONNXRuntime, batchsize=1 takes 94.296ms while batchsize=8 needs 684.891ms.

So it looks like the performance difference comes from the different batch sizes used.

Thanks.

pruizb · November 3, 2022, 5:04pm

Hello,

Yes, PyTorch inference is also using batch size=8.

I agree with you in TensorRT performance, I get the same time for batch size =1 and batch size=8, but the question is why is the process using TensorRT so much slower than using PyTorch?

And finally, as I said in my first post: Is there is any way to get higher performance using dynamic shape with TensorRT?

Thanks.

AastaLLL · November 4, 2022, 2:20am

Hi,

Could you share the inference source so we can reproduce the PyTorch result?

We test the ONNX model with ONNXRuntime.
The elapsed time of batchsize=8 is 684.891ms, which is larger than TensorRT.

It will be good if we can reproduce the PyTorch result first.
Thanks.

pruizb · November 4, 2022, 11:58am

Hello,

You can find the original model and the original code here: GitHub - HobbitLong/SupContrast: PyTorch implementation of "Supervised Contrastive Learning" (and SimCLR incidentally)

In general, the inference process that I have performanced looks like this:

import resnet_big
state_dict = torch.load(‘/path/supcon.pth’)[‘model’]
model = resnet_big.SupConResNet()
from collections import OrderedDict

new_state_dict = OrderedDict()
for k, v in state_dict.items():
name = k[7:] # remove ‘module.’ of dataparallel
new_state_dict[name] = v

model.load_state_dict(new_state_dict)

model = model.float().eval().cuda()

crops → images
crops = torch.Tensor(crops).permute(0,3,1,2)
descriptors = self.model(crops.cuda())

Thanks.

AastaLLL · November 7, 2022, 8:56am

Hi,

Have you tried that latest model and latest source shared in the repository?
It doesn’t work since the error mentioned in the Oct 26.

size mismatch for encoder.conv1.weight: copying a param with shape torch.Size([64, 3, 7, 7]) from checkpoint, the shape in current model is torch.Size([64, 3, 3, 3]).

Based on your implementation, does the images contains 8 image that size is 3x224x224?

Thanks.

pruizb · November 9, 2022, 7:58am

Hello,

Yes, I have tried the latest model shared in the repository and the code too.

It seems like a code problem, you must follow exactly the same steps that I showed you the 4th of November. The next import is from the following code link: https://github.com/HobbitLong/SupContrast/tree/master/networks

import resnet_big

Regarding the second question, my input size is 8x3x224x224.

Thanks.

pruizb · November 11, 2022, 12:02pm

Hello,

Sorry, I was wrong about the code. I did not remember that I had changed the code to make it work. You must change self.shortcut by self.downsample in resnet_big.py file and kernel_size in line 80 by 7. I hope it works.

Sorry again and thanks you.

AastaLLL · November 17, 2022, 9:41am

Hi,

Thanks for the hint.
We can run the model with PyTorch after the change you mentioned.

Below is the performance data that we test for batch=1 and batch=8.
It seems that TensorRT give a better performance compared to ONNXRuntime or PyTorch.

TensorRT

Batch=1: 84.6757ms
Batch=8: 652.653ms

PyTorch (tested by inference.py (979 Bytes))

Batch=1: 123.793ms
Batch=8: 920.4219ms

ONNXRuntime

Batch=1: 94.296ms
Batch=8: 684.891ms

Could you help to confirm it?

Thanks.

pruizb · December 1, 2022, 12:01pm

Hello,

Sorry for the delay.

Checking your pytorch code I have discovered that I was doing fuision of layers and that’s why the pytorch model was faster.

Thanks for your time.

system · December 15, 2022, 12:01pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Inference is so slow with torch1.6 Jetson Xavier NX nvbugs , pytorch	12	3538	October 23, 2020
PyTorch model loosing accuracy when converting to TensorRT TensorRT tensorrt	10	2694	July 26, 2021
Inference result gets worse when converting pytorch model to TensorRT model TensorRT pytorch	6	1140	January 19, 2022
Custom ResNet Jetson Xavier Jetson Xavier NX jetson-inference	12	3063	October 18, 2021
Inference error while using tensorrt engine on jetson nano Jetson Nano tensorrt , nvbugs	23	3621	April 20, 2022
Model inferenced with tensorrt is slower than regular pytorch TensorRT cudnn	1	463	February 16, 2024
Help converting a pytorch model to TensorRT Jetson Xavier NX tensorrt , pytorch	6	2867	October 18, 2021
Extremely slow inference in TensorRT for live semantic segmentation model Jetson AGX Xavier tensorrt , tensorflow , jetson-inference	11	4379	April 12, 2022
SiamMask on Jetson Xavier NX, pytorch, slow FPS Jetson Xavier NX pytorch	22	3165	October 18, 2021
Onnx -> tensorrt fp32 conversion performance degradation different outputs TensorRT tensorrt , pytorch , onnx	4	2050	November 29, 2022

Better inference performance with PyTorch than with TensorRT

Related topics