Low performance in fp32 model on Xaiver

We expected to get higher inference performance on Xaiver. However, when I carry on the inference of pytorch to onnx to fp32 precision tensorrt model, I get a poor running speed.

I tried to run the pytorch-based resnet50 model inference program on xaiver following the official exampletorch2onnx2trt. When I run the pytorch model on xaiver, the speed is 43.34ms per sample, and when I convert the model to onnx and fp32’s tensorrt model, the inference speed becomes 703.43ms per sample, which seems strange. But when I converted the pytorch model to onnx and to fp16’s tensorrt model, the inference speed changed to 5.44ms per sample, which seemed to be the same as the official demo, which is 6.28ms.

Can you answer this question for me? Why does the efficiency of the tensorrt model from pytorch to onnx to fp32 become much worse when inference?

I’m using the Jetson AGX Xaiver of 16G memory.
NVIDIA SDK Manager 1.8, Jpack5.0.1, Ubuntu 20.04 on Xaiver.
Cuda is 11.4.
Tensorrt is 8.4.0.11.

Dear @648976749,
Could you share onnx model. I would like to verify with trtexec tool.

I used these code to convert torch model to onnx model:

load_torch_model = torch.load('./resnet50_torch_model.pth').eval()

BATCH_SIZE = 1
dummy_input=torch.randn(BATCH_SIZE, 3, 224, 224)

import torch.onnx
torch.onnx.export(load_torch_model, dummy_input, "./resnet50_onnx_model.onnx", verbose=False)

and use this command to convert onnx model to tensorrt model:

trtexec --onnx=resnet50_onnx_model.onnx --saveEngine=resnet_engine.trt

Here is the onnx model:
resnet50_onnx_model.onnx (97.4 MB)

Dear @648976749,
I notice below numbers using trtexec

/usr/src/tensorrt/bin/trtexec --onnx=/home/nvidia/resnet50_onnx_model.onnx
[06/16/2022-08:35:15] [I] === Performance summary ===
[06/16/2022-08:27:54] [I] Throughput: 124.961 qps
[06/16/2022-08:27:54] [I] Latency: min = 7.98013 ms, max = 8.12762 ms, mean = 8.03174 ms, median = 8.03023 ms, percentile(99%) = 8.10669 ms
[06/16/2022-08:27:54] [I] Enqueue Time: min = 0.344971 ms, max = 0.9151 ms, mean = 0.528215 ms, median = 0.470154 ms, percentile(99%) = 0.849792 ms
[06/16/2022-08:27:54] [I] H2D Latency: min = 0.0270996 ms, max = 0.133423 ms, mean = 0.0482375 ms, median = 0.0380859 ms, percentile(99%) = 0.0996094 ms
[06/16/2022-08:27:54] [I] GPU Compute Time: min = 7.94235 ms, max = 8.02905 ms, mean = 7.98089 ms, median = 7.97864 ms, percentile(99%) = 8.0257 ms
[06/16/2022-08:27:54] [I] D2H Latency: min = 0.00146484 ms, max = 0.00341797 ms, mean = 0.00261398 ms, median = 0.00268555 ms, percentile(99%) = 0.00326538 ms


/usr/src/tensorrt/bin/trtexec --onnx=/home/nvidia/resnet50_onnx_model.onnx --fp16
[06/16/2022-08:35:15] [I] === Performance summary ===
[06/16/2022-08:35:15] [I] Throughput: 362.688 qps
[06/16/2022-08:35:15] [I] Latency: min = 2.77734 ms, max = 2.87854 ms, mean = 2.80078 ms, median = 2.79565 ms, percentile(99%) = 2.84412 ms
[06/16/2022-08:35:15] [I] Enqueue Time: min = 0.399048 ms, max = 0.932114 ms, mean = 0.604471 ms, median = 0.681885 ms, percentile(99%) = 0.824463 ms
[06/16/2022-08:35:15] [I] H2D Latency: min = 0.029541 ms, max = 0.119522 ms, mean = 0.0431197 ms, median = 0.0369873 ms, percentile(99%) = 0.0864258 ms
[06/16/2022-08:35:15] [I] GPU Compute Time: min = 2.73926 ms, max = 2.77002 ms, mean = 2.75398 ms, median = 2.75391 ms, percentile(99%) = 2.76648 ms
[06/16/2022-08:35:15] [I] D2H Latency: min = 0.0012207 ms, max = 0.00585938 ms, mean = 0.00367615 ms, median = 0.00366211 ms, percentile(99%) = 0.00537109 ms


/usr/src/tensorrt/bin/trtexec --onnx=/home/nvidia/resnet50_onnx_model.onnx --int8
[06/16/2022-08:37:07] [I] === Performance summary ===
[06/16/2022-08:37:07] [I] Throughput: 611.933 qps
[06/16/2022-08:37:07] [I] Latency: min = 1.60986 ms, max = 1.64307 ms, mean = 1.62563 ms, median = 1.62549 ms, percentile(99%) = 1.63812 ms
[06/16/2022-08:37:07] [I] Enqueue Time: min = 0.361084 ms, max = 1.14978 ms, mean = 0.496531 ms, median = 0.473755 ms, percentile(99%) = 0.669434 ms
[06/16/2022-08:37:07] [I] H2D Latency: min = 0.0187988 ms, max = 0.0214844 ms, mean = 0.0196352 ms, median = 0.0195312 ms, percentile(99%) = 0.0209961 ms
[06/16/2022-08:37:07] [I] GPU Compute Time: min = 1.58887 ms, max = 1.62158 ms, mean = 1.60435 ms, median = 1.6041 ms, percentile(99%) = 1.61694 ms
[06/16/2022-08:37:07] [I] D2H Latency: min = 0.0012207 ms, max = 0.00244141 ms, mean = 0.00164729 ms, median = 0.00167847 ms, percentile(99%) = 0.0020752 ms

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.