Is tensorrt slow with group convolution?

OnePieceOfDeepLearning · June 1, 2021, 1:16pm

I tried to run both regnet_1.6g ([2003.13678] Designing Network Design Spaces) and resnet50 on jetson xavier, with jetpack 4.3.

with thop(GitHub - Lyken17/pytorch-OpCounter: Count the MACs / FLOPs of your PyTorch model.),

the flops of regnet1.6g is 1.6g, and resnet50 are 6g.

So it’s expected that regnet1.6g is much faster than resnet50.

However, when running on jetson xavier, regnet1.6g is much slower than resnet50 and nearly the same with resnet100.

on input size with 112x112, resnet50 can runs on 125 FPS and regnet1.6g only runs on 84 FPS.

The huge difference is that regnet1.6g replaces 3x3 convolution with group convolution, Does tensorrt optimize group convolution?

NVES · June 1, 2021, 2:37pm

Hi,
Request you to share the model, script, profiler and performance output if not shared already so that we can help you better.
Alternatively, you can try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-722/best-practices/index.html#measure-performance

Thanks!

OnePieceOfDeepLearning · June 2, 2021, 4:25am

there are links to model and log from trtexec

https://drive.google.com/drive/folders/1KWQGLzDPanMmRjUWG2Q3jQL9y-UWD1qq?usp=sharing

As I said, the result is not expected, regnet is much slower than resnet50, even though the flops of regnet is much smaller.

spolisetty · June 3, 2021, 11:48am

Hi @OnePieceOfDeepLearning,

4X less MAC not guaranteed to result in 4X better FPS. And looks like you’re using old version of TensorRT 6.0 (jetpack 4.3). We recommend you to update Jetpack version and try latest TensorRT version.

Thank you.

OnePieceOfDeepLearning · June 4, 2021, 1:19am

I know that MAC is not linear with FPS, but why FPS of 4X less MAC is much lower?

I use TensorRT7, not TensorRT6.

spolisetty · June 7, 2021, 1:39pm

Hi @OnePieceOfDeepLearning,

It is hard to say, because Group Conv sometimes is bound by CPU not GPU.
Please let us know if this is a blocking issue for you.

Thank you

Topic		Replies	Views
TensorRT 3 RC and grouped convolutions TensorRT	6	3740	October 30, 2018
TensorRT 8.6/10.3 is much slower on Jetpack6 than TensorRT 8.5 Jetpack5 Jetson Orin NX tensorrt	10	342	November 13, 2024
TensorRT 3 grouped deconvolution slower than non-grouped TensorRT	4	815	May 2, 2018
Whats the different between Deconvolution groups and deconvolutional layers? Jetson TX2	4	1724	October 18, 2021
TensorRT group convolution get wrong results TensorRT	5	586	November 25, 2021
Poor group convolution performance in fp16 Jetson Nano	3	1408	October 14, 2021
Tensorrt is slower than pytorch TensorRT	2	2310	September 15, 2021
Lower performance with TRT than plain TF? Jetson Xavier NX tensorrt , jetson-inference	14	2089	October 18, 2021
TensorRT 2x slower than Cudnn for single Conv2D (74 ms vs. 156 ms) TensorRT	6	905	February 5, 2021
TensorRT 6 slower than TensorFlow with 3D convolutions and pooling TensorRT	6	1609	December 20, 2019

Is tensorrt slow with group convolution?

Related topics