Onnx -> TensorRT. No speed difference between models of different sizes

Description

I have two yolov5 models of different sizes. One has 35.9m parameters, the other 12.7m.
When I convert the models to TensorRT with trtexec --onnx=model.onnx --batch=5 --fp16 the resulting models have roughly the same inference speed (21 fps) even though the speed should be vastly different.

Environment

TensorRT Version: 7.1.3
GPU Type: Jetson Xavier AGX
CUDA Version: 10.2.89
CUDNN Version: 8.0
Operating System + Version: Jetpack 4.5.1

Hi,

Could you share the detailed output of trtexec with us first?

Thanks.

Here are the outputs from trtexec
yolov5s6_trtexec.txt (10.5 KB)
yolov5m6_trtexec.txt (11.1 KB)

Can it be due to the Some tactics do not have sufficient workspace memory to run warning?

Thanks!

Hi,

Based on the trtexec log, the differences between these two models are not much.

yolov5s6

204 layers, with elapsed time = 180ms.

Slice_4, Slice_9, Slice_14, Slice_19, Slice_24, Slice_29, Slice_34, Slice_39, Conv_41 + Relu_42, Conv_43 + Relu_44, Conv_52 + Relu_53 || Conv_45 + Relu_46, Conv_47 + Relu_48, Conv_49 + Relu_50, Add_51, 226 copy, Conv_55 + Relu_56, Conv_57 + Relu_58, Conv_81 + Relu_82 || Conv_59 + Relu_60, Conv_61 + Relu_62, Conv_63 + Relu_64, Add_65, Conv_66 + Relu_67, Conv_68 + Relu_69, Add_70, Conv_71 + Relu_72, Conv_73 + Relu_74, Add_75, Conv_76 + Relu_77, Conv_78 + Relu_79, Add_80, 255 copy, Conv_84 + Relu_85, Conv_86 + Relu_87, Conv_110 + Relu_111 || Conv_88 + Relu_89, Conv_90 + Relu_91, Conv_92 + Relu_93, Add_94, Conv_95 + Relu_96, Conv_97 + Relu_98, Add_99, Conv_100 + Relu_101, Conv_102 + Relu_103, Add_104, Conv_105 + Relu_106, Conv_107 + Relu_108, Add_109, 284 copy, Conv_113 + Relu_114, Conv_115 + Relu_116, Conv_124 + Relu_125 || Conv_117 + Relu_118, Conv_119 + Relu_120, Conv_121 + Relu_122, Add_123, 298 copy, Conv_127 + Relu_128, Conv_129 + Relu_130, Conv_131 + Relu_132, MaxPool_135, MaxPool_134, MaxPool_133, 305 copy, Conv_137 + Relu_138, Conv_145 + Relu_146 || Conv_139 + Relu_140, Conv_141 + Relu_142, Conv_143 + Relu_144, 319 copy, Conv_148 + Relu_149, Conv_150 + Relu_151, Resize_153, 329 copy, Conv_161 + Relu_162 || Conv_155 + Relu_156, Conv_157 + Relu_158, Conv_159 + Relu_160, 338 copy, Conv_164 + Relu_165, Conv_166 + Relu_167, Resize_169, 348 copy, Conv_177 + Relu_178 || Conv_171 + Relu_172, Conv_173 + Relu_174, Conv_175 + Relu_176, 357 copy, Conv_180 + Relu_181, Conv_182 + Relu_183, Resize_185, 367 copy, Conv_193 + Relu_194 || Conv_187 + Relu_188, Conv_189 + Relu_190, Conv_191 + Relu_192, 376 copy, Conv_196 + Relu_197, Conv_240, Reshape_254 + Transpose_255, Sigmoid_256, Slice_284, Slice_274, Slice_261, Conv_198 + Relu_199, 362 copy, Conv_207 + Relu_208 || Conv_201 + Relu_202, Conv_203 + Relu_204, (Unnamed Layer* 255) [Constant], (Unnamed Layer* 291) [Constant], Conv_205 + Relu_206, PWN(PWN((Unnamed Layer* 288) [Constant] + (Unnamed Layer* 289) [Shuffle], PWN((Unnamed Layer* 285) [Constant] + (Unnamed Layer* 286) [Shuffle] + Mul_276, Pow_278)), Mul_279), 390 copy, Conv_210 + Relu_211, PWN(PWN(PWN((Unnamed Layer* 252) [Constant] + (Unnamed Layer* 253) [Shuffle], PWN((Unnamed Layer* 249) [Constant] + (Unnamed Layer* 250) [Shuffle] + Mul_263, Sub_265)), Add_267), (Unnamed Layer* 257) [Constant] + (Unnamed Layer* 258) [Shuffle] + Mul_269), 455 copy, 469 copy, 474 copy, Reshape_288, Conv_289, Reshape_303 + Transpose_304, Sigmoid_305, Slice_333, Slice_323, Slice_310, Conv_212 + Relu_213, 395 copy, 343 copy, Conv_221 + Relu_222 || Conv_215 + Relu_216, Conv_217 + Relu_218, (Unnamed Layer* 374) [Constant], (Unnamed Layer* 410) [Constant], Conv_219 + Relu_220, PWN(PWN((Unnamed Layer* 407) [Constant] + (Unnamed Layer* 408) [Shuffle], PWN((Unnamed Layer* 404) [Constant] + (Unnamed Layer* 405) [Shuffle] + Mul_325, Pow_327)), Mul_328), 402 copy, 404 copy, Conv_224 + Relu_225, PWN(PWN(PWN((Unnamed Layer* 371) [Constant] + (Unnamed Layer* 372) [Shuffle], PWN((Unnamed Layer* 368) [Constant] + (Unnamed Layer* 369) [Shuffle] + Mul_312, Sub_314)), Add_316), (Unnamed Layer* 376) [Constant] + (Unnamed Layer* 377) [Shuffle] + Mul_318), 516 copy, 530 copy, 535 copy, Reshape_337, Conv_338, Reshape_352 + Transpose_353, Sigmoid_354, Slice_382, Slice_372, Slice_359, Conv_226 + Relu_227, 409 copy, 324 copy, Conv_235 + Relu_236 || Conv_229 + Relu_230, Conv_231 + Relu_232, (Unnamed Layer* 493) [Constant], (Unnamed Layer* 529) [Constant], Conv_233 + Relu_234, PWN(PWN((Unnamed Layer* 526) [Constant] + (Unnamed Layer* 527) [Shuffle], PWN((Unnamed Layer* 523) [Constant] + (Unnamed Layer* 524) [Shuffle] + Mul_374, Pow_376)), Mul_377), 416 copy, 418 copy, Conv_238 + Relu_239, PWN(PWN(PWN((Unnamed Layer* 490) [Constant] + (Unnamed Layer* 491) [Shuffle], PWN((Unnamed Layer* 487) [Constant] + (Unnamed Layer* 488) [Shuffle] + Mul_361, Sub_363)), Add_365), (Unnamed Layer* 495) [Constant] + (Unnamed Layer* 496) [Shuffle] + Mul_367), 577 copy, 591 copy, 596 copy, Reshape_386, Conv_387, Reshape_401 + Transpose_402, Sigmoid_403, Slice_431, Slice_421, Slice_408, (Unnamed Layer* 612) [Constant], (Unnamed Layer* 648) [Constant], PWN(PWN((Unnamed Layer* 645) [Constant] + (Unnamed Layer* 646) [Shuffle], PWN((Unnamed Layer* 642) [Constant] + (Unnamed Layer* 643) [Shuffle] + Mul_423, Pow_425)), Mul_426), PWN(PWN(PWN((Unnamed Layer* 609) [Constant] + (Unnamed Layer* 610) [Shuffle], PWN((Unnamed Layer* 606) [Constant] + (Unnamed Layer* 607) [Shuffle] + Mul_410, Sub_412)), Add_414), (Unnamed Layer* 614) [Constant] + (Unnamed Layer* 615) [Shuffle] + Mul_416), 638 copy, 652 copy, 657 copy, Reshape_435, 482 copy, 543 copy, 604 copy, 665 copy, 
[06/25/2021-10:01:35] [I] min: 179.721 ms (end to end 179.731 ms)
[06/25/2021-10:01:35] [I] max: 180.024 ms (end to end 180.035 ms)
[06/25/2021-10:01:35] [I] mean: 179.868 ms (end to end 179.878 ms)
[06/25/2021-10:01:35] [I] median: 179.868 ms (end to end 179.88 ms)

yolov5m6

237 layers, with elapsed time = 222ms.

Slice_4, Slice_9, Slice_14, Slice_19, Slice_24, Slice_29, Slice_34, Slice_39, Conv_41 + Relu_42, Conv_43 + Relu_44, Conv_57 + Relu_58 || Conv_45 + Relu_46, Conv_47 + Relu_48, Conv_49 + Relu_50, Add_51, Conv_52 + Relu_53, Conv_54 + Relu_55, Add_56, 283 copy, Conv_60 + Relu_61, Conv_62 + Relu_63, Conv_96 + Relu_97 || Conv_64 + Relu_65, Conv_66 + Relu_67, Conv_68 + Relu_69, Add_70, Conv_71 + Relu_72, Conv_73 + Relu_74, Add_75, Conv_76 + Relu_77, Conv_78 + Relu_79, Add_80, Conv_81 + Relu_82, Conv_83 + Relu_84, Add_85, Conv_86 + Relu_87, Conv_88 + Relu_89, Add_90, Conv_91 + Relu_92, Conv_93 + Relu_94, Add_95, 322 copy, Conv_99 + Relu_100, Conv_101 + Relu_102, Conv_135 + Relu_136 || Conv_103 + Relu_104, Conv_105 + Relu_106, Conv_107 + Relu_108, Add_109, Conv_110 + Relu_111, Conv_112 + Relu_113, Add_114, Conv_115 + Relu_116, Conv_117 + Relu_118, Add_119, Conv_120 + Relu_121, Conv_122 + Relu_123, Add_124, Conv_125 + Relu_126, Conv_127 + Relu_128, Add_129, Conv_130 + Relu_131, Conv_132 + Relu_133, Add_134, 361 copy, Conv_138 + Relu_139, Conv_140 + Relu_141, Conv_154 + Relu_155 || Conv_142 + Relu_143, Conv_144 + Relu_145, Conv_146 + Relu_147, Add_148, Conv_149 + Relu_150, Conv_151 + Relu_152, Add_153, 380 copy, Conv_157 + Relu_158, Conv_159 + Relu_160, Conv_161 + Relu_162, MaxPool_165, MaxPool_164, MaxPool_163, 387 copy, Conv_167 + Relu_168, Conv_179 + Relu_180 || Conv_169 + Relu_170, Conv_171 + Relu_172, Conv_173 + Relu_174, Conv_175 + Relu_176, Conv_177 + Relu_178, 405 copy, Conv_182 + Relu_183, Conv_184 + Relu_185, Resize_187, 415 copy, Conv_199 + Relu_200 || Conv_189 + Relu_190, Conv_191 + Relu_192, Conv_193 + Relu_194, Conv_195 + Relu_196, Conv_197 + Relu_198, 428 copy, Conv_202 + Relu_203, Conv_204 + Relu_205, Resize_207, 438 copy, Conv_219 + Relu_220 || Conv_209 + Relu_210, Conv_211 + Relu_212, Conv_213 + Relu_214, Conv_215 + Relu_216, Conv_217 + Relu_218, 451 copy, Conv_222 + Relu_223, Conv_224 + Relu_225, Resize_227, 461 copy, Conv_239 + Relu_240 || Conv_229 + Relu_230, Conv_231 + Relu_232, Conv_233 + Relu_234, Conv_235 + Relu_236, Conv_237 + Relu_238, 474 copy, Conv_242 + Relu_243, Conv_298, Reshape_312 + Transpose_313, Sigmoid_314, Slice_342, Slice_332, Slice_319, Conv_244 + Relu_245, 456 copy, Conv_257 + Relu_258 || Conv_247 + Relu_248, Conv_249 + Relu_250, (Unnamed Layer* 313) [Constant], (Unnamed Layer* 349) [Constant], Conv_251 + Relu_252, PWN(PWN((Unnamed Layer* 346) [Constant] + (Unnamed Layer* 347) [Shuffle], PWN((Unnamed Layer* 343) [Constant] + (Unnamed Layer* 344) [Shuffle] + Mul_334, Pow_336)), Mul_337), Conv_253 + Relu_254, PWN(PWN(PWN((Unnamed Layer* 310) [Constant] + (Unnamed Layer* 311) [Shuffle], PWN((Unnamed Layer* 307) [Constant] + (Unnamed Layer* 308) [Shuffle] + Mul_321, Sub_323)), Add_325), (Unnamed Layer* 315) [Constant] + (Unnamed Layer* 316) [Shuffle] + Mul_327), 565 copy, 579 copy, 584 copy, Reshape_346, Conv_255 + Relu_256, 490 copy, 492 copy, Conv_260 + Relu_261, Conv_347, Reshape_361 + Transpose_362, Sigmoid_363, Slice_391, Slice_381, Slice_368, Conv_262 + Relu_263, 497 copy, 433 copy, Conv_275 + Relu_276 || Conv_265 + Relu_266, Conv_267 + Relu_268, (Unnamed Layer* 432) [Constant], (Unnamed Layer* 468) [Constant], Conv_269 + Relu_270, PWN(PWN((Unnamed Layer* 465) [Constant] + (Unnamed Layer* 466) [Shuffle], PWN((Unnamed Layer* 462) [Constant] + (Unnamed Layer* 463) [Shuffle] + Mul_383, Pow_385)), Mul_386), Conv_271 + Relu_272, PWN(PWN(PWN((Unnamed Layer* 429) [Constant] + (Unnamed Layer* 430) [Shuffle], PWN((Unnamed Layer* 426) [Constant] + (Unnamed Layer* 427) [Shuffle] + Mul_370, Sub_372)), Add_374), (Unnamed Layer* 434) [Constant] + (Unnamed Layer* 435) [Shuffle] + Mul_376), 626 copy, 640 copy, 645 copy, Reshape_395, Conv_273 + Relu_274, 508 copy, 510 copy, Conv_278 + Relu_279, Conv_396, Reshape_410 + Transpose_411, Sigmoid_412, Slice_440, Slice_430, Slice_417, Conv_280 + Relu_281, 515 copy, 410 copy, Conv_293 + Relu_294 || Conv_283 + Relu_284, Conv_285 + Relu_286, (Unnamed Layer* 551) [Constant], (Unnamed Layer* 587) [Constant], Conv_287 + Relu_288, PWN(PWN((Unnamed Layer* 584) [Constant] + (Unnamed Layer* 585) [Shuffle], PWN((Unnamed Layer* 581) [Constant] + (Unnamed Layer* 582) [Shuffle] + Mul_432, Pow_434)), Mul_435), Conv_289 + Relu_290, PWN(PWN(PWN((Unnamed Layer* 548) [Constant] + (Unnamed Layer* 549) [Shuffle], PWN((Unnamed Layer* 545) [Constant] + (Unnamed Layer* 546) [Shuffle] + Mul_419, Sub_421)), Add_423), (Unnamed Layer* 553) [Constant] + (Unnamed Layer* 554) [Shuffle] + Mul_425), 687 copy, 701 copy, 706 copy, Reshape_444, Conv_291 + Relu_292, 526 copy, 528 copy, Conv_296 + Relu_297, Conv_445, Reshape_459 + Transpose_460, Sigmoid_461, Slice_489, Slice_479, Slice_466, (Unnamed Layer* 670) [Constant], (Unnamed Layer* 706) [Constant], PWN(PWN((Unnamed Layer* 703) [Constant] + (Unnamed Layer* 704) [Shuffle], PWN((Unnamed Layer* 700) [Constant] + (Unnamed Layer* 701) [Shuffle] + Mul_481, Pow_483)), Mul_484), PWN(PWN(PWN((Unnamed Layer* 667) [Constant] + (Unnamed Layer* 668) [Shuffle], PWN((Unnamed Layer* 664) [Constant] + (Unnamed Layer* 665) [Shuffle] + Mul_468, Sub_470)), Add_472), (Unnamed Layer* 672) [Constant] + (Unnamed Layer* 673) [Shuffle] + Mul_474), 748 copy, 762 copy, 767 copy, Reshape_493, 592 copy, 653 copy, 714 copy, 775 copy, 
[06/25/2021-10:31:44] [I] min: 221.72 ms (end to end 221.735 ms)
[06/25/2021-10:31:44] [I] max: 222.158 ms (end to end 222.175 ms)
[06/25/2021-10:31:44] [I] mean: 221.995 ms (end to end 222.004 ms)
[06/25/2021-10:31:44] [I] median: 221.991 ms (end to end 221.996 ms)

The layer number increases ~16%, and inference time increases 23%.
This looks reasonable to me.

Thanks.

Thank you for the reply!

It’s not just the layers increasing, but also the number of channels in each convolutional layer.
The yolos5m6 model, therefore, has 12.7 M parameters while the yolov5m6 model has 35.9 M.
Shouldn’t this affect the inference speed more?

Thanks

Hi,

A possible reason that is that:
The increase of channels is much more friendly for SIMD architecture.
So the overhead is relatively small.

You can compare the same layer with different #channle to validate this.
Thanks.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.