TensorRT with pruned model

richsheep · April 18, 2022, 9:05am

resnet_model_best-depth164-sp.onnx (6.7 MB)
resnet_pruned0.5-depth164-sp.onnx (5.1 MB)
resnet_pruned0.9-depth164-sp.onnx (1.3 MB)

I Just found, pruned model sometimes did not perform better latency.

As below tabel, original model(6.7M) is more faster than pruned 0.5 model(5.1MB)

Why pruning increase latency sometimes?

My test model attached, test command just like trtexec --onnx=xxxxx.onnx

model \ device	253-station 3090
resnet_model_best-depth164-sp.onnx	Throughput: 615.107 qps Latency: min = 1.60278 ms, max = 3.25769 ms, mean = 1.63109 ms, median = 1.608 ms, percentile(99%) = 2.21143 ms
resnet_pruned0.5-depth164-sp-finetuned.onnx	Throughput: 428.871 qps Latency: min = 2.10408 ms, max = 3.13867 ms, mean = 2.31287 ms, median = 2.2218 ms, percentile(99%) = 2.85095 ms
resnet_pruned0.9-depth164-sp.onnx	Throughput: 689.394 qps Latency: min = 1.33484 ms, max = 1.53223 ms, mean = 1.43313 ms, median = 1.43161 ms, percentile(99%) = 1.46887 ms

NVES · April 18, 2022, 1:37pm

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

richsheep · April 19, 2022, 1:37am

models are checked with your script, and all passed, trtexec did not show any error or warning with --verbose flag.
I wonder why trtexec performance get lower after pruning, thanks.

spolisetty · April 19, 2022, 12:26pm

Hi,

What does prune mean here? Reducing the number of channels for some convolution layers? If so, then this behavior is sometimes expected because TensorCore requires padding the channel dimensions to multiples of 8 (for FP16) or 32 (for INT8). If the pruned model does not have these “nice numbers” of channels, additional padding may be required and perf may drop.

The guideline is when doing channel pruning, please prune the channel to multiples of 32.

Thank you.

richsheep · April 20, 2022, 1:59am

yes, i’m doing channel reduce for convolution layers, i will try change channel num to multiple of 32.

many thanks

Topic		Replies	Views
Does network pruning speed up inference speed? TensorRT	6	1683	January 7, 2022
ONNX Model Int64 Weights TensorRT	12	13325	February 17, 2024
Channel pruning on TensorRT does not get speed up TensorRT	2	615	June 29, 2021
Pruning .onnx and convert to .engine Jetson Xavier NX tensorrt	5	1690	May 6, 2022
Inference time of tensorrt 6.3 is slower than tensorrt 6.0 TensorRT tensorrt , driveos	7	916	October 12, 2021
Padding and speedup of tensorrt inference TensorRT	1	370	August 24, 2021
TensorRT: slowdown for buildSerializedNetwork() TensorRT	6	964	April 1, 2023
Building TensorRT 8 engine from ONNX quantized model fails TensorRT	4	889	October 1, 2021
TensorRT inference slower than PyTorch, different tactics are being selected TensorRT tensorrt	1	1348	November 27, 2023
Tensorrt is slower than pytorch TensorRT	2	2229	September 15, 2021

TensorRT with pruned model

check_model.py

Related topics