Channel pruning on TensorRT does not get speed up

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version: 7.1.3.0
GPU Type: jeson nx

I tried to prune resnet50 with 10% sparsity. the flops is reduced. and the speed on pytorch is speed up. but the speed on tensorrt is not speed up.

Does tensorrt has some criteria on channel’s number? for example, maybe it has better optimization if the number of your channel is multiple of 8.

Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance

Thanks!

@OnePieceOfDeepLearning,

Yes, it’s usually better if the number of channels is multiple of 8 (for fp16) or 32 (for INT8).