Does network pruning speed up inference speed?

Description

A clear and concise description of the bug or issue.

Environment

**TensorRT Version **: 8.0
**GPU Type: **: Jetson os[Maxwell]
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version: jetson nano

Question :
i pruned my deeplearning model, and i wil change this model to Tensor RT.
then does this pruning network will speed up Inferencing on tensor RT framework??? [ my current os is jetson nano

if yes, is there any github or blog that i can refer to???

Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#model-accuracy

Thanks!

thanks to reply.

i want to ask is before i try measuring performance, i want to seek advice for Tensor RT optimization.
i used the underline method to prune my model, and i think this pruning method takes sparse layers to accelerate model inference.

so, in tensor RT layers does calculation for sparse layers of convolution network support so my pruining work will accelerate in tensorRT? or what i’m doing is automatively done in tensor RT so my work will not improve speed?

hi.
rtx 3050 laptop
i use pop-os 21.04 and i dont know pop-os use control driver nvidia… Example, ubuntu use nouveau control nvidia. But i think not use best performance rtx. You can help me another choice???

Hi,

If you channel prune models in the right way (and then compress them), you won’t get any increase in speed in TensorRT. GPUs are simply very good at dense math, so unless sparsity is appropriately structured or weights are very sparse, sparse computations are unlikely to improve performance.

Thanks

Also following may help you,

thanks for reply @spolisetty , your answer helps me a lot