Is tensorrt slow with group convolution?

I tried to run both regnet_1.6g ([2003.13678] Designing Network Design Spaces) and resnet50 on jetson xavier, with jetpack 4.3.

with thop(GitHub - Lyken17/pytorch-OpCounter: Count the MACs / FLOPs of your PyTorch model.),

the flops of regnet1.6g is 1.6g, and resnet50 are 6g.

So it’s expected that regnet1.6g is much faster than resnet50.

However, when running on jetson xavier, regnet1.6g is much slower than resnet50 and nearly the same with resnet100.

on input size with 112x112, resnet50 can runs on 125 FPS and regnet1.6g only runs on 84 FPS.

The huge difference is that regnet1.6g replaces 3x3 convolution with group convolution, Does tensorrt optimize group convolution?

Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:


there are links to model and log from trtexec

As I said, the result is not expected, regnet is much slower than resnet50, even though the flops of regnet is much smaller.

Hi @OnePieceOfDeepLearning,

4X less MAC not guaranteed to result in 4X better FPS. And looks like you’re using old version of TensorRT 6.0 (jetpack 4.3). We recommend you to update Jetpack version and try latest TensorRT version.

Thank you.

I know that MAC is not linear with FPS, but why FPS of 4X less MAC is much lower?

I use TensorRT7, not TensorRT6.

Hi @OnePieceOfDeepLearning,

It is hard to say, because Group Conv sometimes is bound by CPU not GPU.
Please let us know if this is a blocking issue for you.

Thank you