Same tensorrt layer perform differently on jetson

Description

I’m trying to deploy a model named VoVNet. I first test it on dgpu and it works great. But when I deploy it on jetsonNano, it takes lots of time to infer.
I try to find out the reasons by using

/usr/src/tensorrt/bin/trtexec --loadEngine=VoVNet.engine --dumpProfile

and the results make me even more confused.

on dgpu:

                               Layer   Time (ms)   Avg. Time (ms)   Time %
[04/08/2024-09:27:46] [I]                                                                                                                                                             /backbone/stem/stem_1/conv/Conv + /backbone/stem/stem_1/relu/Relu       49.50           0.0256      1.7
[04/08/2024-09:27:46] [I]                                                                                                                                                             /backbone/stem/stem_2/conv/Conv + /backbone/stem/stem_2/relu/Relu      275.23           0.1425      9.5
[04/08/2024-09:27:46] [I]                                                                                                                                                             /backbone/stem/stem_3/conv/Conv + /backbone/stem/stem_3/relu/Relu      181.43           0.0939      6.3
[04/08/2024-09:27:46] [I]                                                                                                                     /backbone/stage2/OSA2_1/layers.0/OSA2_1_0/conv/Conv + /backbone/stage2/OSA2_1/layers.0/OSA2_1_0/relu/Relu      125.57           0.0650      4.3
[04/08/2024-09:27:46] [I]                                                                                                                     /backbone/stage2/OSA2_1/layers.1/OSA2_1_1/conv/Conv + /backbone/stage2/OSA2_1/layers.1/OSA2_1_1/relu/Relu       73.94           0.0383      2.6
[04/08/2024-09:27:46] [I]                                                                                                                     /backbone/stage2/OSA2_1/layers.2/OSA2_1_2/conv/Conv + /backbone/stage2/OSA2_1/layers.2/OSA2_1_2/relu/Relu       72.70           0.0376      2.5
[04/08/2024-09:27:46] [I]                                                                                                                                                                                 /backbone/stem/stem_3/relu/Relu_output_0 copy       34.34           0.0178      1.2
[04/08/2024-09:27:46] [I]                                                                                                                                                             /backbone/stage2/OSA2_1/layers.0/OSA2_1_0/relu/Relu_output_0 copy       21.16           0.0110      0.7
[04/08/2024-09:27:46] [I]                                                                                                                                                             /backbone/stage2/OSA2_1/layers.1/OSA2_1_1/relu/Relu_output_0 copy       21.16           0.0110      0.7
[04/08/2024-09:27:46] [I]                                                                                                               /backbone/stage2/OSA2_1/concat/OSA2_1_concat/conv/Conv + /backbone/stage2/OSA2_1/concat/OSA2_1_concat/relu/Relu      111.30           0.0576      3.9
[04/08/2024-09:27:46] [I]                                                                                                                                                                        /backbone/stage2/OSA2_1/ese/avg_pool/GlobalAveragePool       32.55           0.0168      1.1
[04/08/2024-09:27:46] [I]                                                                                              /backbone/stage2/OSA2_1/ese/fc/Conv + /backbone/stage2/OSA2_1/ese/hsigmoid/Relu + PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Clip)        8.98           0.0046      0.3
[04/08/2024-09:27:46] [I]                                                 PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0 + (Unnamed Layer* 23) [Shuffle], /backbone/stage2/OSA2_1/ese/hsigmoid/Div), /backbone/stage2/OSA2_1/ese/Mul)       21.65           0.0112      0.7
[04/08/2024-09:27:46] [I]                                                                                                                                                                                              /backbone/stage3/Pooling/MaxPool       17.16           0.0089      0.6
[04/08/2024-09:27:46] [I]                                                                                                                                                                                                      /avg_pool_4x/AveragePool       11.37           0.0059      0.4
[04/08/2024-09:27:46] [I]                                                                                                                     /backbone/stage3/OSA3_1/layers.0/OSA3_1_0/conv/Conv + /backbone/stage3/OSA3_1/layers.0/OSA3_1_0/relu/Relu       58.79           0.0304      2.0
[04/08/2024-09:27:46] [I]                                                                                                                     /backbone/stage3/OSA3_1/layers.1/OSA3_1_1/conv/Conv + /backbone/stage3/OSA3_1/layers.1/OSA3_1_1/relu/Relu       45.10           0.0233      1.6
[04/08/2024-09:27:46] [I]                                                                                                                     /backbone/stage3/OSA3_1/layers.2/OSA3_1_2/conv/Conv + /backbone/stage3/OSA3_1/layers.2/OSA3_1_2/relu/Relu       45.09           0.0233      1.6
[04/08/2024-09:27:46] [I]                                                                                                                                                                                /backbone/stage3/Pooling/MaxPool_output_0 copy        8.52           0.0044      0.3
[04/08/2024-09:27:46] [I]                                                                                                                                                             /backbone/stage3/OSA3_1/layers.0/OSA3_1_0/relu/Relu_output_0 copy        7.82           0.0040      0.3
[04/08/2024-09:27:46] [I]                                                                                                                                                             /backbone/stage3/OSA3_1/layers.1/OSA3_1_1/relu/Relu_output_0 copy        7.69           0.0040      0.3
[04/08/2024-09:27:46] [I]                                                                                                               /backbone/stage3/OSA3_1/concat/OSA3_1_concat/conv/Conv + /backbone/stage3/OSA3_1/concat/OSA3_1_concat/relu/Relu       72.69           0.0376      2.5
[04/08/2024-09:27:46] [I]                                                                                                                                                                        /backbone/stage3/OSA3_1/ese/avg_pool/GlobalAveragePool       21.47           0.0111      0.7
[04/08/2024-09:27:46] [I]                                                                                              /backbone/stage3/OSA3_1/ese/fc/Conv + /backbone/stage3/OSA3_1/ese/hsigmoid/Relu + PWN(/backbone/stage3/OSA3_1/ese/hsigmoid/Clip)       10.01           0.0052      0.3
[04/08/2024-09:27:46] [I]                                               PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0_1 + (Unnamed Layer* 44) [Shuffle], /backbone/stage3/OSA3_1/ese/hsigmoid/Div), /backbone/stage3/OSA3_1/ese/Mul)       19.10           0.0099      0.7
[04/08/2024-09:27:46] [I]                                                                                                                                                                                              /backbone/stage4/Pooling/MaxPool       12.37           0.0064      0.4
[04/08/2024-09:27:46] [I]                                                                                                                                                                                                      /avg_pool_2x/AveragePool       13.38           0.0069      0.5
[04/08/2024-09:27:46] [I]                                                                                                                     /backbone/stage4/OSA4_1/layers.0/OSA4_1_0/conv/Conv + /backbone/stage4/OSA4_1/layers.0/OSA4_1_0/relu/Relu       98.47           0.0510      3.4
[04/08/2024-09:27:46] [I]                                                                                                                                                                                                               /latlayer1/Conv       25.73           0.0133      0.9
[04/08/2024-09:27:46] [I]                                                                                                                     /backbone/stage4/OSA4_1/layers.1/OSA4_1_1/conv/Conv + /backbone/stage4/OSA4_1/layers.1/OSA4_1_1/relu/Relu       44.69           0.0231      1.5
[04/08/2024-09:27:46] [I]                                                                                                                     /backbone/stage4/OSA4_1/layers.2/OSA4_1_2/conv/Conv + /backbone/stage4/OSA4_1/layers.2/OSA4_1_2/relu/Relu       44.59           0.0231      1.5
[04/08/2024-09:27:46] [I]                                                                        Reformatting CopyNode for Output Tensor 0 to /backbone/stage4/OSA4_1/layers.2/OSA4_1_2/conv/Conv + /backbone/stage4/OSA4_1/layers.2/OSA4_1_2/relu/Relu        8.71           0.0045      0.3
[04/08/2024-09:27:46] [I]                                                                                                                                                                                /backbone/stage4/Pooling/MaxPool_output_0 copy       11.78           0.0061      0.4
[04/08/2024-09:27:46] [I]                                                                                                                                                             /backbone/stage4/OSA4_1/layers.0/OSA4_1_0/relu/Relu_output_0 copy        8.48           0.0044      0.3
[04/08/2024-09:27:46] [I]                                                                                                                                                             /backbone/stage4/OSA4_1/layers.1/OSA4_1_1/relu/Relu_output_0 copy        8.45           0.0044      0.3
[04/08/2024-09:27:46] [I]                                                                                                               /backbone/stage4/OSA4_1/concat/OSA4_1_concat/conv/Conv + /backbone/stage4/OSA4_1/concat/OSA4_1_concat/relu/Relu       57.76           0.0299      2.0
[04/08/2024-09:27:46] [I]                                                                                                                            Reformatting CopyNode for Input Tensor 0 to /backbone/stage4/OSA4_1/ese/avg_pool/GlobalAveragePool       14.34           0.0074      0.5
[04/08/2024-09:27:46] [I]                                                                                                                                                                        /backbone/stage4/OSA4_1/ese/avg_pool/GlobalAveragePool       15.15           0.0078      0.5
[04/08/2024-09:27:46] [I]                                                                                              /backbone/stage4/OSA4_1/ese/fc/Conv + /backbone/stage4/OSA4_1/ese/hsigmoid/Relu + PWN(/backbone/stage4/OSA4_1/ese/hsigmoid/Clip)       11.91           0.0062      0.4
[04/08/2024-09:27:46] [I]   Reformatting CopyNode for Input Tensor 0 to PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0_3 + (Unnamed Layer* 65) [Shuffle], /backbone/stage4/OSA4_1/ese/hsigmoid/Div), /backbone/stage4/OSA4_1/ese/Mul)        3.04           0.0016      0.1
[04/08/2024-09:27:46] [I]                                               PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0_3 + (Unnamed Layer* 65) [Shuffle], /backbone/stage4/OSA4_1/ese/hsigmoid/Div), /backbone/stage4/OSA4_1/ese/Mul)        9.73           0.0050      0.3
[04/08/2024-09:27:46] [I]  Reformatting CopyNode for Output Tensor 0 to PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0_3 + (Unnamed Layer* 65) [Shuffle], /backbone/stage4/OSA4_1/ese/hsigmoid/Div), /backbone/stage4/OSA4_1/ese/Mul)       14.30           0.0074      0.5
[04/08/2024-09:27:46] [I]                                                                                                                                                                                              /backbone/stage5/Pooling/MaxPool        9.19           0.0048      0.3
[04/08/2024-09:27:46] [I]                                                                                                                                                                                                               /latlayer2/Conv       30.30           0.0157      1.0
[04/08/2024-09:27:46] [I]                                                                                                                     /backbone/stage5/OSA5_1/layers.0/OSA5_1_0/conv/Conv + /backbone/stage5/OSA5_1/layers.0/OSA5_1_0/relu/Relu      140.55           0.0727      4.9
[04/08/2024-09:27:46] [I]                                                                                                                     /backbone/stage5/OSA5_1/layers.1/OSA5_1_1/conv/Conv + /backbone/stage5/OSA5_1/layers.1/OSA5_1_1/relu/Relu       49.94           0.0258      1.7
[04/08/2024-09:27:46] [I]                                                                                                                     /backbone/stage5/OSA5_1/layers.2/OSA5_1_2/conv/Conv + /backbone/stage5/OSA5_1/layers.2/OSA5_1_2/relu/Relu       49.79           0.0258      1.7
[04/08/2024-09:27:46] [I]                                                                                                                                                                                /backbone/stage5/Pooling/MaxPool_output_0 copy        6.71           0.0035      0.2
[04/08/2024-09:27:46] [I]                                                                                                                                                             /backbone/stage5/OSA5_1/layers.0/OSA5_1_0/relu/Relu_output_0 copy        6.30           0.0033      0.2
[04/08/2024-09:27:46] [I]                                                                                                                                                             /backbone/stage5/OSA5_1/layers.1/OSA5_1_1/relu/Relu_output_0 copy        6.74           0.0035      0.2
[04/08/2024-09:27:46] [I]                                                                                                               /backbone/stage5/OSA5_1/concat/OSA5_1_concat/conv/Conv + /backbone/stage5/OSA5_1/concat/OSA5_1_concat/relu/Relu       51.40           0.0266      1.8
[04/08/2024-09:27:46] [I]                                                                                                                                                                        /backbone/stage5/OSA5_1/ese/avg_pool/GlobalAveragePool       11.98           0.0062      0.4
[04/08/2024-09:27:46] [I]                                                                                              /backbone/stage5/OSA5_1/ese/fc/Conv + /backbone/stage5/OSA5_1/ese/hsigmoid/Relu + PWN(/backbone/stage5/OSA5_1/ese/hsigmoid/Clip)       13.45           0.0070      0.5
[04/08/2024-09:27:46] [I]                                               PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0_5 + (Unnamed Layer* 86) [Shuffle], /backbone/stage5/OSA5_1/ese/hsigmoid/Div), /backbone/stage5/OSA5_1/ese/Mul)        8.40           0.0043      0.3
[04/08/2024-09:27:46] [I]                                                                                                                                                                                                           /upsample_2x/Resize       11.69           0.0061      0.4
[04/08/2024-09:27:46] [I]                                                                                                                                                                                                               /latlayer3/Conv      425.98           0.2205     14.7
[04/08/2024-09:27:46] [I]                                                                                                                                                     /SPP/Conv1x1/conv1x1/conv1x1.0/Conv + /SPP/Conv1x1/conv1x1/conv1x1.2/Relu       37.11           0.0192      1.3
[04/08/2024-09:27:46] [I]                                                                                                                                                                                         /SPP/S1/S1.0/Conv + /SPP/S1/S1.2/Relu       13.33           0.0069      0.5
[04/08/2024-09:27:46] [I]                                                                                                                                                                                         /SPP/S2/S2.0/Conv + /SPP/S2/S2.2/Relu       13.44           0.0070      0.5
[04/08/2024-09:27:46] [I]                                                                                                                                                                                         /SPP/S3/S3.0/Conv + /SPP/S3/S3.2/Relu       12.28           0.0064      0.4
[04/08/2024-09:27:46] [I]                                                                                                                                                                                         /SPP/S2/S2.3/Conv + /SPP/S2/S2.5/Relu       12.24           0.0063      0.4
[04/08/2024-09:27:46] [I]                                                                                                                                                                                         /SPP/S3/S3.3/Conv + /SPP/S3/S3.5/Relu       12.25           0.0063      0.4
[04/08/2024-09:27:46] [I]                                                                                                                                                                                         /SPP/S3/S3.6/Conv + /SPP/S3/S3.8/Relu       12.23           0.0063      0.4
[04/08/2024-09:27:46] [I]                                                                                                                                                                         /SPP/output/output.0/Conv + /SPP/Add + /SPP/relu/Relu       34.71           0.0180      1.2
[04/08/2024-09:27:46] [I]                                                                                                                                     /detect_head/conv1x1/conv1x1/conv1x1.0/Conv + /detect_head/conv1x1/conv1x1/conv1x1.2/Relu       22.68           0.0117      0.8
[04/08/2024-09:27:46] [I]                                                                                                                               /detect_head/obj_layers/conv5x5/conv5x5.0/Conv + /detect_head/obj_layers/conv5x5/conv5x5.2/Relu       20.59           0.0107      0.7
[04/08/2024-09:27:46] [I]                                                                                                                               /detect_head/reg_layers/conv5x5/conv5x5.0/Conv + /detect_head/reg_layers/conv5x5/conv5x5.2/Relu       12.24           0.0063      0.4
[04/08/2024-09:27:46] [I]                                                                                                                               /detect_head/cls_layers/conv5x5/conv5x5.0/Conv + /detect_head/cls_layers/conv5x5/conv5x5.2/Relu       12.21           0.0063      0.4
[04/08/2024-09:27:46] [I]                                                                                                                                                                                /detect_head/obj_layers/conv5x5/conv5x5.3/Conv       21.55           0.0112      0.7
[04/08/2024-09:27:46] [I]                                                                                                                                                                                /detect_head/reg_layers/conv5x5/conv5x5.3/Conv       14.68           0.0076      0.5
[04/08/2024-09:27:46] [I]                                                                                                                                                                                /detect_head/cls_layers/conv5x5/conv5x5.3/Conv       20.79           0.0108      0.7
[04/08/2024-09:27:46] [I]                                                                                                                                                                                             PWN(/detect_head/sigmoid/Sigmoid)        7.50           0.0039      0.3
[04/08/2024-09:27:46] [I]                                                                                                                                                               /detect_head/softmax/Transpose + (Unnamed Layer* 136) [Shuffle]        7.40           0.0038      0.3
[04/08/2024-09:27:46] [I]                                                                                                                                                                                                  /detect_head/softmax/Softmax        7.71           0.0040      0.3
[04/08/2024-09:27:46] [I]                                                                                                                                                             (Unnamed Layer* 138) [Shuffle] + /detect_head/softmax/Transpose_1        7.43           0.0038      0.3
[04/08/2024-09:27:46] [I]                                                                                                                                                                                    /detect_head/sigmoid/Sigmoid_output_0 copy        6.59           0.0034      0.2
[04/08/2024-09:27:46] [I]                                                                                                                                                                                                                         Total     2888.56           1.4951    100.0

on jetsonNano

                            Layer   Time (ms)   Avg. Time (ms)   Time %
[04/08/2024-16:54:21] [I]                                                                                                                                                          /backbone/stem/stem_1/conv/Conv + /backbone/stem/stem_1/relu/Relu       72.43           1.8108      2.2
[04/08/2024-16:54:21] [I]                                                                                                                                                          /backbone/stem/stem_2/conv/Conv + /backbone/stem/stem_2/relu/Relu      391.99           9.7997     12.1
[04/08/2024-16:54:21] [I]                                                                                                                                                          /backbone/stem/stem_3/conv/Conv + /backbone/stem/stem_3/relu/Relu      253.55           6.3389      7.8
[04/08/2024-16:54:21] [I]                                                                                                                  /backbone/stage2/OSA2_1/layers.0/OSA2_1_0/conv/Conv + /backbone/stage2/OSA2_1/layers.0/OSA2_1_0/relu/Relu      161.57           4.0392      5.0
[04/08/2024-16:54:21] [I]                                                                                                                  /backbone/stage2/OSA2_1/layers.1/OSA2_1_1/conv/Conv + /backbone/stage2/OSA2_1/layers.1/OSA2_1_1/relu/Relu       92.03           2.3007      2.8
[04/08/2024-16:54:21] [I]                                                                                                                  /backbone/stage2/OSA2_1/layers.2/OSA2_1_2/conv/Conv + /backbone/stage2/OSA2_1/layers.2/OSA2_1_2/relu/Relu       92.29           2.3072      2.8
[04/08/2024-16:54:21] [I]                                                                                                                                                                              /backbone/stem/stem_3/relu/Relu_output_0 copy       18.98           0.4744      0.6
[04/08/2024-16:54:21] [I]                                                                                                                                                          /backbone/stage2/OSA2_1/layers.0/OSA2_1_0/relu/Relu_output_0 copy        9.46           0.2364      0.3
[04/08/2024-16:54:21] [I]                                                                                                                                                          /backbone/stage2/OSA2_1/layers.1/OSA2_1_1/relu/Relu_output_0 copy        9.47           0.2368      0.3
[04/08/2024-16:54:21] [I]                                                                                                            /backbone/stage2/OSA2_1/concat/OSA2_1_concat/conv/Conv + /backbone/stage2/OSA2_1/concat/OSA2_1_concat/relu/Relu      143.07           3.5766      4.4
[04/08/2024-16:54:21] [I]                                                                                                                                                                     /backbone/stage2/OSA2_1/ese/avg_pool/GlobalAveragePool       10.70           0.2674      0.3
[04/08/2024-16:54:21] [I]                                               Reformatting CopyNode for Input Tensor 0 to /backbone/stage2/OSA2_1/ese/fc/Conv + /backbone/stage2/OSA2_1/ese/hsigmoid/Relu + PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Clip)        0.06           0.0014      0.0
[04/08/2024-16:54:21] [I]                                                                                           /backbone/stage2/OSA2_1/ese/fc/Conv + /backbone/stage2/OSA2_1/ese/hsigmoid/Relu + PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Clip)        1.68           0.0420      0.1
[04/08/2024-16:54:21] [I]  Reformatting CopyNode for Input Tensor 0 to PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0 + (Unnamed Layer* 23) [Shuffle], /backbone/stage2/OSA2_1/ese/hsigmoid/Div), /backbone/stage2/OSA2_1/ese/Mul)        0.02           0.0006      0.0
[04/08/2024-16:54:21] [I]                                              PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0 + (Unnamed Layer* 23) [Shuffle], /backbone/stage2/OSA2_1/ese/hsigmoid/Div), /backbone/stage2/OSA2_1/ese/Mul)       61.81           1.5453      1.9
[04/08/2024-16:54:21] [I]                                                                                                                                                                                           /backbone/stage3/Pooling/MaxPool       23.31           0.5826      0.7
[04/08/2024-16:54:21] [I]                                                                                                                                                                                                   /avg_pool_4x/AveragePool       11.30           0.2825      0.3
[04/08/2024-16:54:21] [I]                                                                                                                  /backbone/stage3/OSA3_1/layers.0/OSA3_1_0/conv/Conv + /backbone/stage3/OSA3_1/layers.0/OSA3_1_0/relu/Relu       58.74           1.4686      1.8
[04/08/2024-16:54:21] [I]                                                                                                                  /backbone/stage3/OSA3_1/layers.1/OSA3_1_1/conv/Conv + /backbone/stage3/OSA3_1/layers.1/OSA3_1_1/relu/Relu       44.06           1.1014      1.4
[04/08/2024-16:54:21] [I]                                                                                                                  /backbone/stage3/OSA3_1/layers.2/OSA3_1_2/conv/Conv + /backbone/stage3/OSA3_1/layers.2/OSA3_1_2/relu/Relu       44.16           1.1041      1.4
[04/08/2024-16:54:21] [I]                                                                                                                                                                             /backbone/stage3/Pooling/MaxPool_output_0 copy        4.48           0.1119      0.1
[04/08/2024-16:54:21] [I]                                                                                                                                                          /backbone/stage3/OSA3_1/layers.0/OSA3_1_0/relu/Relu_output_0 copy        3.27           0.0817      0.1
[04/08/2024-16:54:21] [I]                                                                                                                                                          /backbone/stage3/OSA3_1/layers.1/OSA3_1_1/relu/Relu_output_0 copy        3.26           0.0814      0.1
[04/08/2024-16:54:21] [I]                                                                                                            /backbone/stage3/OSA3_1/concat/OSA3_1_concat/conv/Conv + /backbone/stage3/OSA3_1/concat/OSA3_1_concat/relu/Relu       83.08           2.0769      2.6
[04/08/2024-16:54:21] [I]                                                                                                                                                                     /backbone/stage3/OSA3_1/ese/avg_pool/GlobalAveragePool        6.45           0.1613      0.2
[04/08/2024-16:54:21] [I]                                                                                           /backbone/stage3/OSA3_1/ese/fc/Conv + /backbone/stage3/OSA3_1/ese/hsigmoid/Relu + PWN(/backbone/stage3/OSA3_1/ese/hsigmoid/Clip)     1180.69          29.5172     36.3
[04/08/2024-16:54:21] [I]                                            PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0_1 + (Unnamed Layer* 44) [Shuffle], /backbone/stage3/OSA3_1/ese/hsigmoid/Div), /backbone/stage3/OSA3_1/ese/Mul)       32.68           0.8169      1.0
[04/08/2024-16:54:21] [I]                                                                                                                                                                                           /backbone/stage4/Pooling/MaxPool       13.83           0.3459      0.4
[04/08/2024-16:54:21] [I]                                                                                                                                                                                                   /avg_pool_2x/AveragePool       21.61           0.5403      0.7
[04/08/2024-16:54:21] [I]                                                                                                                  /backbone/stage4/OSA4_1/layers.0/OSA4_1_0/conv/Conv + /backbone/stage4/OSA4_1/layers.0/OSA4_1_0/relu/Relu       39.58           0.9896      1.2
[04/08/2024-16:54:21] [I]                                                                                                                                                                                                            /latlayer1/Conv        8.56           0.2140      0.3
[04/08/2024-16:54:21] [I]                                                                                                                  /backbone/stage4/OSA4_1/layers.1/OSA4_1_1/conv/Conv + /backbone/stage4/OSA4_1/layers.1/OSA4_1_1/relu/Relu       17.00           0.4250      0.5
[04/08/2024-16:54:21] [I]                                                                                                                  /backbone/stage4/OSA4_1/layers.2/OSA4_1_2/conv/Conv + /backbone/stage4/OSA4_1/layers.2/OSA4_1_2/relu/Relu       16.77           0.4193      0.5
[04/08/2024-16:54:21] [I]                                                                                                                                                                             /backbone/stage4/Pooling/MaxPool_output_0 copy        2.62           0.0654      0.1
[04/08/2024-16:54:21] [I]                                                                                                                                                          /backbone/stage4/OSA4_1/layers.0/OSA4_1_0/relu/Relu_output_0 copy        1.07           0.0267      0.0
[04/08/2024-16:54:21] [I]                                                                                                                                                          /backbone/stage4/OSA4_1/layers.1/OSA4_1_1/relu/Relu_output_0 copy        1.07           0.0267      0.0
[04/08/2024-16:54:21] [I]                                                                                                            /backbone/stage4/OSA4_1/concat/OSA4_1_concat/conv/Conv + /backbone/stage4/OSA4_1/concat/OSA4_1_concat/relu/Relu       46.72           1.1680      1.4
[04/08/2024-16:54:21] [I]                                                                                                                                                                     /backbone/stage4/OSA4_1/ese/avg_pool/GlobalAveragePool        3.10           0.0775      0.1
[04/08/2024-16:54:21] [I]                                                                                           /backbone/stage4/OSA4_1/ese/fc/Conv + /backbone/stage4/OSA4_1/ese/hsigmoid/Relu + PWN(/backbone/stage4/OSA4_1/ese/hsigmoid/Clip)        5.78           0.1445      0.2
[04/08/2024-16:54:21] [I]                                            PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0_3 + (Unnamed Layer* 65) [Shuffle], /backbone/stage4/OSA4_1/ese/hsigmoid/Div), /backbone/stage4/OSA4_1/ese/Mul)       13.26           0.3314      0.4
[04/08/2024-16:54:21] [I]                                                                                                                                                                                           /backbone/stage5/Pooling/MaxPool        4.98           0.1245      0.2
[04/08/2024-16:54:21] [I]                                                                                                                                                                                                            /latlayer2/Conv       12.15           0.3037      0.4
[04/08/2024-16:54:21] [I]                                                                                                                  /backbone/stage5/OSA5_1/layers.0/OSA5_1_0/conv/Conv + /backbone/stage5/OSA5_1/layers.0/OSA5_1_0/relu/Relu       25.61           0.6402      0.8
[04/08/2024-16:54:21] [I]                                                                                                                  /backbone/stage5/OSA5_1/layers.1/OSA5_1_1/conv/Conv + /backbone/stage5/OSA5_1/layers.1/OSA5_1_1/relu/Relu        9.04           0.2259      0.3
[04/08/2024-16:54:21] [I]                                                                                                                  /backbone/stage5/OSA5_1/layers.2/OSA5_1_2/conv/Conv + /backbone/stage5/OSA5_1/layers.2/OSA5_1_2/relu/Relu        8.81           0.2202      0.3
[04/08/2024-16:54:21] [I]                                                                                                                                                                             /backbone/stage5/Pooling/MaxPool_output_0 copy        1.06           0.0264      0.0
[04/08/2024-16:54:21] [I]                                                                                                                                                          /backbone/stage5/OSA5_1/layers.0/OSA5_1_0/relu/Relu_output_0 copy        0.41           0.0103      0.0
[04/08/2024-16:54:21] [I]                                                                                                                                                          /backbone/stage5/OSA5_1/layers.1/OSA5_1_1/relu/Relu_output_0 copy        0.42           0.0104      0.0
[04/08/2024-16:54:21] [I]                                                                                                            /backbone/stage5/OSA5_1/concat/OSA5_1_concat/conv/Conv + /backbone/stage5/OSA5_1/concat/OSA5_1_concat/relu/Relu       20.28           0.5069      0.6
[04/08/2024-16:54:21] [I]                                                                                                                                                                     /backbone/stage5/OSA5_1/ese/avg_pool/GlobalAveragePool        1.66           0.0416      0.1
[04/08/2024-16:54:21] [I]                                                                                           /backbone/stage5/OSA5_1/ese/fc/Conv + /backbone/stage5/OSA5_1/ese/hsigmoid/Relu + PWN(/backbone/stage5/OSA5_1/ese/hsigmoid/Clip)        9.38           0.2344      0.3
[04/08/2024-16:54:21] [I]                                            PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0_5 + (Unnamed Layer* 86) [Shuffle], /backbone/stage5/OSA5_1/ese/hsigmoid/Div), /backbone/stage5/OSA5_1/ese/Mul)        4.61           0.1153      0.1
[04/08/2024-16:54:21] [I]                                                                                                                                                                                                        /upsample_2x/Resize        8.60           0.2149      0.3
[04/08/2024-16:54:21] [I]                                                                                                                                                                                                            /latlayer3/Conv       14.85           0.3712      0.5
[04/08/2024-16:54:21] [I]                                                                                                                                                                                     /avg_pool_4x/AveragePool_output_0 copy        1.24           0.0311      0.0
[04/08/2024-16:54:21] [I]                                                                                                                                                  /SPP/Conv1x1/conv1x1/conv1x1.0/Conv + /SPP/Conv1x1/conv1x1/conv1x1.2/Relu       13.27           0.3318      0.4
[04/08/2024-16:54:21] [I]                                                                                                                                                                                      /SPP/S1/S1.0/Conv + /SPP/S1/S1.2/Relu        9.90           0.2475      0.3
[04/08/2024-16:54:21] [I]                                                                                                                                                                                      /SPP/S2/S2.0/Conv + /SPP/S2/S2.2/Relu        9.83           0.2458      0.3
[04/08/2024-16:54:21] [I]                                                                                                                                                                                      /SPP/S3/S3.0/Conv + /SPP/S3/S3.2/Relu       10.58           0.2645      0.3
[04/08/2024-16:54:21] [I]                                                                                                                                                                                      /SPP/S2/S2.3/Conv + /SPP/S2/S2.5/Relu        9.82           0.2455      0.3
[04/08/2024-16:54:21] [I]                                                                                                                                                                                      /SPP/S3/S3.3/Conv + /SPP/S3/S3.5/Relu        9.80           0.2449      0.3
[04/08/2024-16:54:21] [I]                                                                                                                                                                                      /SPP/S3/S3.6/Conv + /SPP/S3/S3.8/Relu        9.75           0.2438      0.3
[04/08/2024-16:54:21] [I]                                                                                                                                                                      /SPP/output/output.0/Conv + /SPP/Add + /SPP/relu/Relu        8.73           0.2184      0.3
[04/08/2024-16:54:21] [I]                                                                                                                                  /detect_head/conv1x1/conv1x1/conv1x1.0/Conv + /detect_head/conv1x1/conv1x1/conv1x1.2/Relu        3.50           0.0876      0.1
[04/08/2024-16:54:21] [I]                                                                                                                            /detect_head/obj_layers/conv5x5/conv5x5.0/Conv + /detect_head/obj_layers/conv5x5/conv5x5.2/Relu        9.75           0.2437      0.3
[04/08/2024-16:54:21] [I]                                                                                                                            /detect_head/reg_layers/conv5x5/conv5x5.0/Conv + /detect_head/reg_layers/conv5x5/conv5x5.2/Relu        9.79           0.2447      0.3
[04/08/2024-16:54:21] [I]                                                                                                                            /detect_head/cls_layers/conv5x5/conv5x5.0/Conv + /detect_head/cls_layers/conv5x5/conv5x5.2/Relu        9.75           0.2438      0.3
[04/08/2024-16:54:21] [I]                                                                                                                                                                             /detect_head/obj_layers/conv5x5/conv5x5.3/Conv        1.73           0.0433      0.1
[04/08/2024-16:54:21] [I]                                                                                                                                                                             /detect_head/reg_layers/conv5x5/conv5x5.3/Conv        1.84           0.0460      0.1
[04/08/2024-16:54:21] [I]                                                                                                                                                                             /detect_head/cls_layers/conv5x5/conv5x5.3/Conv        1.47           0.0367      0.0
[04/08/2024-16:54:21] [I]                                                                                                                                                                                          PWN(/detect_head/sigmoid/Sigmoid)        0.21           0.0053      0.0
[04/08/2024-16:54:21] [I]                                                                                                                                                            /detect_head/softmax/Transpose + (Unnamed Layer* 136) [Shuffle]        0.33           0.0082      0.0
[04/08/2024-16:54:21] [I]                                                                                                                                                                                               /detect_head/softmax/Softmax        1.18           0.0295      0.0
[04/08/2024-16:54:21] [I]                                                                                                                                                          (Unnamed Layer* 138) [Shuffle] + /detect_head/softmax/Transpose_1        0.31           0.0078      0.0
[04/08/2024-16:54:21] [I]                                                                                                                                                                                 /detect_head/sigmoid/Sigmoid_output_0 copy        0.15           0.0038      0.0
[04/08/2024-16:54:21] [I]                                                                                                                                                                                                                      Total     3250.31          81.2577    100.0
[04/08/2024-16:54:21] [I] 

and I notice the differences are

/backbone/stage3/OSA3_1/ese/fc/Conv + /backbone/stage3/OSA3_1/ese/hsigmoid/Relu + PWN(/backbone/stage3/OSA3_1/ese/hsigmoid/Clip)     1180.69          29.5172     36.3

and

/latlayer3/Conv      425.98           0.2205     14.7

I’m sure they both share the same onnx file and the same codes to trans to trt engine file.

any help will be appropriated.

For more details, the eSE module codes:

class Hsigmoid(nn.Module):
    def __init__(self, inplace=True):
        super(Hsigmoid, self).__init__()
        self.inplace = inplace

    def forward(self, x):
        return F.relu6(x + 3.0, inplace=self.inplace) / 6.0


class eSEModule(nn.Module):
    def __init__(self, channel, reduction=4):
        super(eSEModule, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Conv2d(channel, channel, kernel_size=1, padding=0)
        self.hsigmoid = Hsigmoid()

    def forward(self, x):
        input = x
        x = self.avg_pool(x)
        x = self.fc(x)
        x = self.hsigmoid(x)
        x = input * x
        return x

the input feature before stage3_ese is [1x256x44x44].