I tried to run both regnet_1.6g ([2003.13678] Designing Network Design Spaces) and resnet50 on jetson xavier, with jetpack 4.3.
the flops of regnet1.6g is 1.6g, and resnet50 are 6g.
So it’s expected that regnet1.6g is much faster than resnet50.
However, when running on jetson xavier, regnet1.6g is much slower than resnet50 and nearly the same with resnet100.
on input size with 112x112, resnet50 can runs on 125 FPS and regnet1.6g only runs on 84 FPS.
The huge difference is that regnet1.6g replaces 3x3 convolution with group convolution, Does tensorrt optimize group convolution?