Does network pruning speed up inference speed?

GURUGURU · November 16, 2021, 6:15am

Description

A clear and concise description of the bug or issue.

Environment

**TensorRT Version **: 8.0
**GPU Type: **: Jetson os[Maxwell]
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version: jetson nano

Question :
i pruned my deeplearning model, and i wil change this model to Tensor RT.
then does this pruning network will speed up Inferencing on tensor RT framework??? [ my current os is jetson nano

if yes, is there any github or blog that i can refer to???

NVES · November 16, 2021, 6:38am

Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#model-accuracy

Thanks!

GURUGURU · November 16, 2021, 7:06am

thanks to reply.

i want to ask is before i try measuring performance, i want to seek advice for Tensor RT optimization.
i used the underline method to prune my model, and i think this pruning method takes sparse layers to accelerate model inference.

so, in tensor RT layers does calculation for sparse layers of convolution network support so my pruining work will accelerate in tensorRT? or what i’m doing is automatively done in tensor RT so my work will not improve speed?

nvidia03 · November 29, 2021, 2:19pm

hi.
rtx 3050 laptop
i use pop-os 21.04 and i dont know pop-os use control driver nvidia… Example, ubuntu use nouveau control nvidia. But i think not use best performance rtx. You can help me another choice???

spolisetty · January 4, 2022, 12:40pm

Hi,

If you channel prune models in the right way (and then compress them), you won’t get any increase in speed in TensorRT. GPUs are simply very good at dense math, so unless sparsity is appropriately structured or weights are very sparse, sparse computations are unlikely to improve performance.

Thanks

spolisetty · January 4, 2022, 12:45pm

Also following may help you,

GURUGURU · January 7, 2022, 3:24am

thanks for reply @spolisetty , your answer helps me a lot

Topic		Replies	Views
Should pruning a model prior to converting it to tensorRT make inference faster? Jetson TX2 tensorrt	12	2819	October 18, 2021
TensorRT with pruned model TensorRT tensorrt	4	819	April 20, 2022
Does weight pruning help improve the inference speed of pruned models on TX2? Jetson TX2	1	507	August 2, 2019
Channel pruning on TensorRT does not get speed up TensorRT	2	615	June 29, 2021
Techniques to Imporve TensorRT Model Inference Speed TensorRT tensorrt , cudnn	0	10	April 30, 2025
EfficientNetB5 on jetson nano? DeepStream SDK	8	1252	December 7, 2021
Inference time of tensorrt 6.3 is slower than tensorrt 6.0 TensorRT tensorrt , driveos	7	916	October 12, 2021
Speed up or measure progress of the network profiling/building phase TensorRT	3	483	May 24, 2022
Inference time increases rapidly when set a high resolution input image TensorRT tensorrt , cuda , ubuntu	1	800	September 13, 2023
List of all methods of getting accelerated computing on Jetson Xavier Jetson AGX Xavier jetson-inference	5	490	October 18, 2021

Does network pruning speed up inference speed?

Description

Environment

Related topics