TensorRT with pruned model

resnet_model_best-depth164-sp.onnx (6.7 MB)
resnet_pruned0.5-depth164-sp.onnx (5.1 MB)
resnet_pruned0.9-depth164-sp.onnx (1.3 MB)

I Just found, pruned model sometimes did not perform better latency.

As below tabel, original model(6.7M) is more faster than pruned 0.5 model(5.1MB)

Why pruning increase latency sometimes?

My test model attached, test command just like trtexec --onnx=xxxxx.onnx

model \ device 253-station 3090
resnet_model_best-depth164-sp.onnx Throughput: 615.107 qps
Latency: min = 1.60278 ms, max = 3.25769 ms, mean = 1.63109 ms, median = 1.608 ms, percentile(99%) = 2.21143 ms
resnet_pruned0.5-depth164-sp-finetuned.onnx Throughput: 428.871 qps
Latency: min = 2.10408 ms, max = 3.13867 ms, mean = 2.31287 ms, median = 2.2218 ms, percentile(99%) = 2.85095 ms
resnet_pruned0.9-depth164-sp.onnx Throughput: 689.394 qps
Latency: min = 1.33484 ms, max = 1.53223 ms, mean = 1.43313 ms, median = 1.43161 ms, percentile(99%) = 1.46887 ms

Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet


import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging

models are checked with your script, and all passed, trtexec did not show any error or warning with --verbose flag.
I wonder why trtexec performance get lower after pruning, thanks.


What does prune mean here? Reducing the number of channels for some convolution layers? If so, then this behavior is sometimes expected because TensorCore requires padding the channel dimensions to multiples of 8 (for FP16) or 32 (for INT8). If the pruned model does not have these “nice numbers” of channels, additional padding may be required and perf may drop.

The guideline is when doing channel pruning, please prune the channel to multiples of 32.

Thank you.

yes, i’m doing channel reduce for convolution layers, i will try change channel num to multiple of 32.

many thanks

1 Like