TensorRT optimization for pruning

famoson4 · November 5, 2019, 4:11am

Hi,

When converting a pruned model to a TensorRT model in caffe, are the pruned weights removed or excluded from the operation?

johano3sxx · November 22, 2019, 2:38pm

As far as I can tell, TensorRT does not automatically remove pruned weights.

With a heavily pruned TF model (deflates 80% when zipping the frozen graph), I see no increase in inference speed after I have converted it to a TensorRT engine with the python API 2.

But I REALLY hopy I’m doing something wrong and that TensorRT is capable of this.

johano3sxx · January 17, 2020, 3:02pm

Bump, would be nice to have an “official” answer.

Is there an existing feature for this? Or planned?

SunilJB · June 11, 2020, 6:21pm

Could you please elaborate more on what form of pruning you are using in this case?

Also, if possible could you please share the model and sample script to reproduce the issue so we can help better.

Thanks

johano3sxx · June 11, 2020, 8:51pm

Hi,

I used https://www.tensorflow.org/model_optimization/guide/pruning which basically makes the model sparse - but if I understand it correctly TensorRT does not have any optimizations for sparse matrix multiplications.

Or do you mean that you expect an increase in inference speed for sparse models?

I’ve seen many references to pruned models with TensorRT, but no mention as to what techniques were used that TensorRT.

SunilJB · June 15, 2020, 5:37am

If you channel prune models in the right way (and then compress them), you won’t get any increase in speed in TensorRT.
But you should contact the people who created those models for more information on how they were pruned, since it wasn’t anything in TensorRT.

GPUs are simply very good at dense math, so unless sparsity is appropriately structured or weights are very sparse, sparse computations are unlikely to improve performance.

Thanks

Topic		Replies	Views
Tensor RT and weights pruning GPU-Accelerated Libraries	0	1131	August 10, 2017
Does weight pruning help improve the inference speed of pruned models on TX2? Jetson TX2	1	508	August 2, 2019
Tensorrt performance General Topics and Other SDKs tensorrt	0	485	March 30, 2022
Should pruning a model prior to converting it to tensorRT make inference faster? Jetson TX2 tensorrt	12	2825	October 18, 2021
Does network pruning speed up inference speed? TensorRT	6	1690	January 7, 2022
How does tensorRT behave? Jetson Nano tensorrt	2	338	February 9, 2022
No performance improvement for Tensorflow TensorRT model on converted on Jetsons Xavier NX Jetson Xavier NX tensorrt , tensorflow	2	677	October 18, 2021
Channel pruning on TensorRT does not get speed up TensorRT	2	615	June 29, 2021
No performance improvement with TF-TRT optimization (ResNet50, DenseNet121) TensorRT	4	1090	June 15, 2020
FCN Alexnet model pruning Jetson TX2	4	997	October 18, 2021

TensorRT optimization for pruning

Related topics