Description
I’m trying to deploy a model named VoVNet. I first test it on dgpu and it works great. But when I deploy it on jetsonNano, it takes lots of time to infer.
I try to find out the reasons by using
/usr/src/tensorrt/bin/trtexec --loadEngine=VoVNet.engine --dumpProfile
and the results make me even more confused.
on dgpu:
Layer Time (ms) Avg. Time (ms) Time %
[04/08/2024-09:27:46] [I] /backbone/stem/stem_1/conv/Conv + /backbone/stem/stem_1/relu/Relu 49.50 0.0256 1.7
[04/08/2024-09:27:46] [I] /backbone/stem/stem_2/conv/Conv + /backbone/stem/stem_2/relu/Relu 275.23 0.1425 9.5
[04/08/2024-09:27:46] [I] /backbone/stem/stem_3/conv/Conv + /backbone/stem/stem_3/relu/Relu 181.43 0.0939 6.3
[04/08/2024-09:27:46] [I] /backbone/stage2/OSA2_1/layers.0/OSA2_1_0/conv/Conv + /backbone/stage2/OSA2_1/layers.0/OSA2_1_0/relu/Relu 125.57 0.0650 4.3
[04/08/2024-09:27:46] [I] /backbone/stage2/OSA2_1/layers.1/OSA2_1_1/conv/Conv + /backbone/stage2/OSA2_1/layers.1/OSA2_1_1/relu/Relu 73.94 0.0383 2.6
[04/08/2024-09:27:46] [I] /backbone/stage2/OSA2_1/layers.2/OSA2_1_2/conv/Conv + /backbone/stage2/OSA2_1/layers.2/OSA2_1_2/relu/Relu 72.70 0.0376 2.5
[04/08/2024-09:27:46] [I] /backbone/stem/stem_3/relu/Relu_output_0 copy 34.34 0.0178 1.2
[04/08/2024-09:27:46] [I] /backbone/stage2/OSA2_1/layers.0/OSA2_1_0/relu/Relu_output_0 copy 21.16 0.0110 0.7
[04/08/2024-09:27:46] [I] /backbone/stage2/OSA2_1/layers.1/OSA2_1_1/relu/Relu_output_0 copy 21.16 0.0110 0.7
[04/08/2024-09:27:46] [I] /backbone/stage2/OSA2_1/concat/OSA2_1_concat/conv/Conv + /backbone/stage2/OSA2_1/concat/OSA2_1_concat/relu/Relu 111.30 0.0576 3.9
[04/08/2024-09:27:46] [I] /backbone/stage2/OSA2_1/ese/avg_pool/GlobalAveragePool 32.55 0.0168 1.1
[04/08/2024-09:27:46] [I] /backbone/stage2/OSA2_1/ese/fc/Conv + /backbone/stage2/OSA2_1/ese/hsigmoid/Relu + PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Clip) 8.98 0.0046 0.3
[04/08/2024-09:27:46] [I] PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0 + (Unnamed Layer* 23) [Shuffle], /backbone/stage2/OSA2_1/ese/hsigmoid/Div), /backbone/stage2/OSA2_1/ese/Mul) 21.65 0.0112 0.7
[04/08/2024-09:27:46] [I] /backbone/stage3/Pooling/MaxPool 17.16 0.0089 0.6
[04/08/2024-09:27:46] [I] /avg_pool_4x/AveragePool 11.37 0.0059 0.4
[04/08/2024-09:27:46] [I] /backbone/stage3/OSA3_1/layers.0/OSA3_1_0/conv/Conv + /backbone/stage3/OSA3_1/layers.0/OSA3_1_0/relu/Relu 58.79 0.0304 2.0
[04/08/2024-09:27:46] [I] /backbone/stage3/OSA3_1/layers.1/OSA3_1_1/conv/Conv + /backbone/stage3/OSA3_1/layers.1/OSA3_1_1/relu/Relu 45.10 0.0233 1.6
[04/08/2024-09:27:46] [I] /backbone/stage3/OSA3_1/layers.2/OSA3_1_2/conv/Conv + /backbone/stage3/OSA3_1/layers.2/OSA3_1_2/relu/Relu 45.09 0.0233 1.6
[04/08/2024-09:27:46] [I] /backbone/stage3/Pooling/MaxPool_output_0 copy 8.52 0.0044 0.3
[04/08/2024-09:27:46] [I] /backbone/stage3/OSA3_1/layers.0/OSA3_1_0/relu/Relu_output_0 copy 7.82 0.0040 0.3
[04/08/2024-09:27:46] [I] /backbone/stage3/OSA3_1/layers.1/OSA3_1_1/relu/Relu_output_0 copy 7.69 0.0040 0.3
[04/08/2024-09:27:46] [I] /backbone/stage3/OSA3_1/concat/OSA3_1_concat/conv/Conv + /backbone/stage3/OSA3_1/concat/OSA3_1_concat/relu/Relu 72.69 0.0376 2.5
[04/08/2024-09:27:46] [I] /backbone/stage3/OSA3_1/ese/avg_pool/GlobalAveragePool 21.47 0.0111 0.7
[04/08/2024-09:27:46] [I] /backbone/stage3/OSA3_1/ese/fc/Conv + /backbone/stage3/OSA3_1/ese/hsigmoid/Relu + PWN(/backbone/stage3/OSA3_1/ese/hsigmoid/Clip) 10.01 0.0052 0.3
[04/08/2024-09:27:46] [I] PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0_1 + (Unnamed Layer* 44) [Shuffle], /backbone/stage3/OSA3_1/ese/hsigmoid/Div), /backbone/stage3/OSA3_1/ese/Mul) 19.10 0.0099 0.7
[04/08/2024-09:27:46] [I] /backbone/stage4/Pooling/MaxPool 12.37 0.0064 0.4
[04/08/2024-09:27:46] [I] /avg_pool_2x/AveragePool 13.38 0.0069 0.5
[04/08/2024-09:27:46] [I] /backbone/stage4/OSA4_1/layers.0/OSA4_1_0/conv/Conv + /backbone/stage4/OSA4_1/layers.0/OSA4_1_0/relu/Relu 98.47 0.0510 3.4
[04/08/2024-09:27:46] [I] /latlayer1/Conv 25.73 0.0133 0.9
[04/08/2024-09:27:46] [I] /backbone/stage4/OSA4_1/layers.1/OSA4_1_1/conv/Conv + /backbone/stage4/OSA4_1/layers.1/OSA4_1_1/relu/Relu 44.69 0.0231 1.5
[04/08/2024-09:27:46] [I] /backbone/stage4/OSA4_1/layers.2/OSA4_1_2/conv/Conv + /backbone/stage4/OSA4_1/layers.2/OSA4_1_2/relu/Relu 44.59 0.0231 1.5
[04/08/2024-09:27:46] [I] Reformatting CopyNode for Output Tensor 0 to /backbone/stage4/OSA4_1/layers.2/OSA4_1_2/conv/Conv + /backbone/stage4/OSA4_1/layers.2/OSA4_1_2/relu/Relu 8.71 0.0045 0.3
[04/08/2024-09:27:46] [I] /backbone/stage4/Pooling/MaxPool_output_0 copy 11.78 0.0061 0.4
[04/08/2024-09:27:46] [I] /backbone/stage4/OSA4_1/layers.0/OSA4_1_0/relu/Relu_output_0 copy 8.48 0.0044 0.3
[04/08/2024-09:27:46] [I] /backbone/stage4/OSA4_1/layers.1/OSA4_1_1/relu/Relu_output_0 copy 8.45 0.0044 0.3
[04/08/2024-09:27:46] [I] /backbone/stage4/OSA4_1/concat/OSA4_1_concat/conv/Conv + /backbone/stage4/OSA4_1/concat/OSA4_1_concat/relu/Relu 57.76 0.0299 2.0
[04/08/2024-09:27:46] [I] Reformatting CopyNode for Input Tensor 0 to /backbone/stage4/OSA4_1/ese/avg_pool/GlobalAveragePool 14.34 0.0074 0.5
[04/08/2024-09:27:46] [I] /backbone/stage4/OSA4_1/ese/avg_pool/GlobalAveragePool 15.15 0.0078 0.5
[04/08/2024-09:27:46] [I] /backbone/stage4/OSA4_1/ese/fc/Conv + /backbone/stage4/OSA4_1/ese/hsigmoid/Relu + PWN(/backbone/stage4/OSA4_1/ese/hsigmoid/Clip) 11.91 0.0062 0.4
[04/08/2024-09:27:46] [I] Reformatting CopyNode for Input Tensor 0 to PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0_3 + (Unnamed Layer* 65) [Shuffle], /backbone/stage4/OSA4_1/ese/hsigmoid/Div), /backbone/stage4/OSA4_1/ese/Mul) 3.04 0.0016 0.1
[04/08/2024-09:27:46] [I] PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0_3 + (Unnamed Layer* 65) [Shuffle], /backbone/stage4/OSA4_1/ese/hsigmoid/Div), /backbone/stage4/OSA4_1/ese/Mul) 9.73 0.0050 0.3
[04/08/2024-09:27:46] [I] Reformatting CopyNode for Output Tensor 0 to PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0_3 + (Unnamed Layer* 65) [Shuffle], /backbone/stage4/OSA4_1/ese/hsigmoid/Div), /backbone/stage4/OSA4_1/ese/Mul) 14.30 0.0074 0.5
[04/08/2024-09:27:46] [I] /backbone/stage5/Pooling/MaxPool 9.19 0.0048 0.3
[04/08/2024-09:27:46] [I] /latlayer2/Conv 30.30 0.0157 1.0
[04/08/2024-09:27:46] [I] /backbone/stage5/OSA5_1/layers.0/OSA5_1_0/conv/Conv + /backbone/stage5/OSA5_1/layers.0/OSA5_1_0/relu/Relu 140.55 0.0727 4.9
[04/08/2024-09:27:46] [I] /backbone/stage5/OSA5_1/layers.1/OSA5_1_1/conv/Conv + /backbone/stage5/OSA5_1/layers.1/OSA5_1_1/relu/Relu 49.94 0.0258 1.7
[04/08/2024-09:27:46] [I] /backbone/stage5/OSA5_1/layers.2/OSA5_1_2/conv/Conv + /backbone/stage5/OSA5_1/layers.2/OSA5_1_2/relu/Relu 49.79 0.0258 1.7
[04/08/2024-09:27:46] [I] /backbone/stage5/Pooling/MaxPool_output_0 copy 6.71 0.0035 0.2
[04/08/2024-09:27:46] [I] /backbone/stage5/OSA5_1/layers.0/OSA5_1_0/relu/Relu_output_0 copy 6.30 0.0033 0.2
[04/08/2024-09:27:46] [I] /backbone/stage5/OSA5_1/layers.1/OSA5_1_1/relu/Relu_output_0 copy 6.74 0.0035 0.2
[04/08/2024-09:27:46] [I] /backbone/stage5/OSA5_1/concat/OSA5_1_concat/conv/Conv + /backbone/stage5/OSA5_1/concat/OSA5_1_concat/relu/Relu 51.40 0.0266 1.8
[04/08/2024-09:27:46] [I] /backbone/stage5/OSA5_1/ese/avg_pool/GlobalAveragePool 11.98 0.0062 0.4
[04/08/2024-09:27:46] [I] /backbone/stage5/OSA5_1/ese/fc/Conv + /backbone/stage5/OSA5_1/ese/hsigmoid/Relu + PWN(/backbone/stage5/OSA5_1/ese/hsigmoid/Clip) 13.45 0.0070 0.5
[04/08/2024-09:27:46] [I] PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0_5 + (Unnamed Layer* 86) [Shuffle], /backbone/stage5/OSA5_1/ese/hsigmoid/Div), /backbone/stage5/OSA5_1/ese/Mul) 8.40 0.0043 0.3
[04/08/2024-09:27:46] [I] /upsample_2x/Resize 11.69 0.0061 0.4
[04/08/2024-09:27:46] [I] /latlayer3/Conv 425.98 0.2205 14.7
[04/08/2024-09:27:46] [I] /SPP/Conv1x1/conv1x1/conv1x1.0/Conv + /SPP/Conv1x1/conv1x1/conv1x1.2/Relu 37.11 0.0192 1.3
[04/08/2024-09:27:46] [I] /SPP/S1/S1.0/Conv + /SPP/S1/S1.2/Relu 13.33 0.0069 0.5
[04/08/2024-09:27:46] [I] /SPP/S2/S2.0/Conv + /SPP/S2/S2.2/Relu 13.44 0.0070 0.5
[04/08/2024-09:27:46] [I] /SPP/S3/S3.0/Conv + /SPP/S3/S3.2/Relu 12.28 0.0064 0.4
[04/08/2024-09:27:46] [I] /SPP/S2/S2.3/Conv + /SPP/S2/S2.5/Relu 12.24 0.0063 0.4
[04/08/2024-09:27:46] [I] /SPP/S3/S3.3/Conv + /SPP/S3/S3.5/Relu 12.25 0.0063 0.4
[04/08/2024-09:27:46] [I] /SPP/S3/S3.6/Conv + /SPP/S3/S3.8/Relu 12.23 0.0063 0.4
[04/08/2024-09:27:46] [I] /SPP/output/output.0/Conv + /SPP/Add + /SPP/relu/Relu 34.71 0.0180 1.2
[04/08/2024-09:27:46] [I] /detect_head/conv1x1/conv1x1/conv1x1.0/Conv + /detect_head/conv1x1/conv1x1/conv1x1.2/Relu 22.68 0.0117 0.8
[04/08/2024-09:27:46] [I] /detect_head/obj_layers/conv5x5/conv5x5.0/Conv + /detect_head/obj_layers/conv5x5/conv5x5.2/Relu 20.59 0.0107 0.7
[04/08/2024-09:27:46] [I] /detect_head/reg_layers/conv5x5/conv5x5.0/Conv + /detect_head/reg_layers/conv5x5/conv5x5.2/Relu 12.24 0.0063 0.4
[04/08/2024-09:27:46] [I] /detect_head/cls_layers/conv5x5/conv5x5.0/Conv + /detect_head/cls_layers/conv5x5/conv5x5.2/Relu 12.21 0.0063 0.4
[04/08/2024-09:27:46] [I] /detect_head/obj_layers/conv5x5/conv5x5.3/Conv 21.55 0.0112 0.7
[04/08/2024-09:27:46] [I] /detect_head/reg_layers/conv5x5/conv5x5.3/Conv 14.68 0.0076 0.5
[04/08/2024-09:27:46] [I] /detect_head/cls_layers/conv5x5/conv5x5.3/Conv 20.79 0.0108 0.7
[04/08/2024-09:27:46] [I] PWN(/detect_head/sigmoid/Sigmoid) 7.50 0.0039 0.3
[04/08/2024-09:27:46] [I] /detect_head/softmax/Transpose + (Unnamed Layer* 136) [Shuffle] 7.40 0.0038 0.3
[04/08/2024-09:27:46] [I] /detect_head/softmax/Softmax 7.71 0.0040 0.3
[04/08/2024-09:27:46] [I] (Unnamed Layer* 138) [Shuffle] + /detect_head/softmax/Transpose_1 7.43 0.0038 0.3
[04/08/2024-09:27:46] [I] /detect_head/sigmoid/Sigmoid_output_0 copy 6.59 0.0034 0.2
[04/08/2024-09:27:46] [I] Total 2888.56 1.4951 100.0
on jetsonNano
Layer Time (ms) Avg. Time (ms) Time %
[04/08/2024-16:54:21] [I] /backbone/stem/stem_1/conv/Conv + /backbone/stem/stem_1/relu/Relu 72.43 1.8108 2.2
[04/08/2024-16:54:21] [I] /backbone/stem/stem_2/conv/Conv + /backbone/stem/stem_2/relu/Relu 391.99 9.7997 12.1
[04/08/2024-16:54:21] [I] /backbone/stem/stem_3/conv/Conv + /backbone/stem/stem_3/relu/Relu 253.55 6.3389 7.8
[04/08/2024-16:54:21] [I] /backbone/stage2/OSA2_1/layers.0/OSA2_1_0/conv/Conv + /backbone/stage2/OSA2_1/layers.0/OSA2_1_0/relu/Relu 161.57 4.0392 5.0
[04/08/2024-16:54:21] [I] /backbone/stage2/OSA2_1/layers.1/OSA2_1_1/conv/Conv + /backbone/stage2/OSA2_1/layers.1/OSA2_1_1/relu/Relu 92.03 2.3007 2.8
[04/08/2024-16:54:21] [I] /backbone/stage2/OSA2_1/layers.2/OSA2_1_2/conv/Conv + /backbone/stage2/OSA2_1/layers.2/OSA2_1_2/relu/Relu 92.29 2.3072 2.8
[04/08/2024-16:54:21] [I] /backbone/stem/stem_3/relu/Relu_output_0 copy 18.98 0.4744 0.6
[04/08/2024-16:54:21] [I] /backbone/stage2/OSA2_1/layers.0/OSA2_1_0/relu/Relu_output_0 copy 9.46 0.2364 0.3
[04/08/2024-16:54:21] [I] /backbone/stage2/OSA2_1/layers.1/OSA2_1_1/relu/Relu_output_0 copy 9.47 0.2368 0.3
[04/08/2024-16:54:21] [I] /backbone/stage2/OSA2_1/concat/OSA2_1_concat/conv/Conv + /backbone/stage2/OSA2_1/concat/OSA2_1_concat/relu/Relu 143.07 3.5766 4.4
[04/08/2024-16:54:21] [I] /backbone/stage2/OSA2_1/ese/avg_pool/GlobalAveragePool 10.70 0.2674 0.3
[04/08/2024-16:54:21] [I] Reformatting CopyNode for Input Tensor 0 to /backbone/stage2/OSA2_1/ese/fc/Conv + /backbone/stage2/OSA2_1/ese/hsigmoid/Relu + PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Clip) 0.06 0.0014 0.0
[04/08/2024-16:54:21] [I] /backbone/stage2/OSA2_1/ese/fc/Conv + /backbone/stage2/OSA2_1/ese/hsigmoid/Relu + PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Clip) 1.68 0.0420 0.1
[04/08/2024-16:54:21] [I] Reformatting CopyNode for Input Tensor 0 to PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0 + (Unnamed Layer* 23) [Shuffle], /backbone/stage2/OSA2_1/ese/hsigmoid/Div), /backbone/stage2/OSA2_1/ese/Mul) 0.02 0.0006 0.0
[04/08/2024-16:54:21] [I] PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0 + (Unnamed Layer* 23) [Shuffle], /backbone/stage2/OSA2_1/ese/hsigmoid/Div), /backbone/stage2/OSA2_1/ese/Mul) 61.81 1.5453 1.9
[04/08/2024-16:54:21] [I] /backbone/stage3/Pooling/MaxPool 23.31 0.5826 0.7
[04/08/2024-16:54:21] [I] /avg_pool_4x/AveragePool 11.30 0.2825 0.3
[04/08/2024-16:54:21] [I] /backbone/stage3/OSA3_1/layers.0/OSA3_1_0/conv/Conv + /backbone/stage3/OSA3_1/layers.0/OSA3_1_0/relu/Relu 58.74 1.4686 1.8
[04/08/2024-16:54:21] [I] /backbone/stage3/OSA3_1/layers.1/OSA3_1_1/conv/Conv + /backbone/stage3/OSA3_1/layers.1/OSA3_1_1/relu/Relu 44.06 1.1014 1.4
[04/08/2024-16:54:21] [I] /backbone/stage3/OSA3_1/layers.2/OSA3_1_2/conv/Conv + /backbone/stage3/OSA3_1/layers.2/OSA3_1_2/relu/Relu 44.16 1.1041 1.4
[04/08/2024-16:54:21] [I] /backbone/stage3/Pooling/MaxPool_output_0 copy 4.48 0.1119 0.1
[04/08/2024-16:54:21] [I] /backbone/stage3/OSA3_1/layers.0/OSA3_1_0/relu/Relu_output_0 copy 3.27 0.0817 0.1
[04/08/2024-16:54:21] [I] /backbone/stage3/OSA3_1/layers.1/OSA3_1_1/relu/Relu_output_0 copy 3.26 0.0814 0.1
[04/08/2024-16:54:21] [I] /backbone/stage3/OSA3_1/concat/OSA3_1_concat/conv/Conv + /backbone/stage3/OSA3_1/concat/OSA3_1_concat/relu/Relu 83.08 2.0769 2.6
[04/08/2024-16:54:21] [I] /backbone/stage3/OSA3_1/ese/avg_pool/GlobalAveragePool 6.45 0.1613 0.2
[04/08/2024-16:54:21] [I] /backbone/stage3/OSA3_1/ese/fc/Conv + /backbone/stage3/OSA3_1/ese/hsigmoid/Relu + PWN(/backbone/stage3/OSA3_1/ese/hsigmoid/Clip) 1180.69 29.5172 36.3
[04/08/2024-16:54:21] [I] PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0_1 + (Unnamed Layer* 44) [Shuffle], /backbone/stage3/OSA3_1/ese/hsigmoid/Div), /backbone/stage3/OSA3_1/ese/Mul) 32.68 0.8169 1.0
[04/08/2024-16:54:21] [I] /backbone/stage4/Pooling/MaxPool 13.83 0.3459 0.4
[04/08/2024-16:54:21] [I] /avg_pool_2x/AveragePool 21.61 0.5403 0.7
[04/08/2024-16:54:21] [I] /backbone/stage4/OSA4_1/layers.0/OSA4_1_0/conv/Conv + /backbone/stage4/OSA4_1/layers.0/OSA4_1_0/relu/Relu 39.58 0.9896 1.2
[04/08/2024-16:54:21] [I] /latlayer1/Conv 8.56 0.2140 0.3
[04/08/2024-16:54:21] [I] /backbone/stage4/OSA4_1/layers.1/OSA4_1_1/conv/Conv + /backbone/stage4/OSA4_1/layers.1/OSA4_1_1/relu/Relu 17.00 0.4250 0.5
[04/08/2024-16:54:21] [I] /backbone/stage4/OSA4_1/layers.2/OSA4_1_2/conv/Conv + /backbone/stage4/OSA4_1/layers.2/OSA4_1_2/relu/Relu 16.77 0.4193 0.5
[04/08/2024-16:54:21] [I] /backbone/stage4/Pooling/MaxPool_output_0 copy 2.62 0.0654 0.1
[04/08/2024-16:54:21] [I] /backbone/stage4/OSA4_1/layers.0/OSA4_1_0/relu/Relu_output_0 copy 1.07 0.0267 0.0
[04/08/2024-16:54:21] [I] /backbone/stage4/OSA4_1/layers.1/OSA4_1_1/relu/Relu_output_0 copy 1.07 0.0267 0.0
[04/08/2024-16:54:21] [I] /backbone/stage4/OSA4_1/concat/OSA4_1_concat/conv/Conv + /backbone/stage4/OSA4_1/concat/OSA4_1_concat/relu/Relu 46.72 1.1680 1.4
[04/08/2024-16:54:21] [I] /backbone/stage4/OSA4_1/ese/avg_pool/GlobalAveragePool 3.10 0.0775 0.1
[04/08/2024-16:54:21] [I] /backbone/stage4/OSA4_1/ese/fc/Conv + /backbone/stage4/OSA4_1/ese/hsigmoid/Relu + PWN(/backbone/stage4/OSA4_1/ese/hsigmoid/Clip) 5.78 0.1445 0.2
[04/08/2024-16:54:21] [I] PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0_3 + (Unnamed Layer* 65) [Shuffle], /backbone/stage4/OSA4_1/ese/hsigmoid/Div), /backbone/stage4/OSA4_1/ese/Mul) 13.26 0.3314 0.4
[04/08/2024-16:54:21] [I] /backbone/stage5/Pooling/MaxPool 4.98 0.1245 0.2
[04/08/2024-16:54:21] [I] /latlayer2/Conv 12.15 0.3037 0.4
[04/08/2024-16:54:21] [I] /backbone/stage5/OSA5_1/layers.0/OSA5_1_0/conv/Conv + /backbone/stage5/OSA5_1/layers.0/OSA5_1_0/relu/Relu 25.61 0.6402 0.8
[04/08/2024-16:54:21] [I] /backbone/stage5/OSA5_1/layers.1/OSA5_1_1/conv/Conv + /backbone/stage5/OSA5_1/layers.1/OSA5_1_1/relu/Relu 9.04 0.2259 0.3
[04/08/2024-16:54:21] [I] /backbone/stage5/OSA5_1/layers.2/OSA5_1_2/conv/Conv + /backbone/stage5/OSA5_1/layers.2/OSA5_1_2/relu/Relu 8.81 0.2202 0.3
[04/08/2024-16:54:21] [I] /backbone/stage5/Pooling/MaxPool_output_0 copy 1.06 0.0264 0.0
[04/08/2024-16:54:21] [I] /backbone/stage5/OSA5_1/layers.0/OSA5_1_0/relu/Relu_output_0 copy 0.41 0.0103 0.0
[04/08/2024-16:54:21] [I] /backbone/stage5/OSA5_1/layers.1/OSA5_1_1/relu/Relu_output_0 copy 0.42 0.0104 0.0
[04/08/2024-16:54:21] [I] /backbone/stage5/OSA5_1/concat/OSA5_1_concat/conv/Conv + /backbone/stage5/OSA5_1/concat/OSA5_1_concat/relu/Relu 20.28 0.5069 0.6
[04/08/2024-16:54:21] [I] /backbone/stage5/OSA5_1/ese/avg_pool/GlobalAveragePool 1.66 0.0416 0.1
[04/08/2024-16:54:21] [I] /backbone/stage5/OSA5_1/ese/fc/Conv + /backbone/stage5/OSA5_1/ese/hsigmoid/Relu + PWN(/backbone/stage5/OSA5_1/ese/hsigmoid/Clip) 9.38 0.2344 0.3
[04/08/2024-16:54:21] [I] PWN(PWN(/backbone/stage2/OSA2_1/ese/hsigmoid/Constant_2_output_0_5 + (Unnamed Layer* 86) [Shuffle], /backbone/stage5/OSA5_1/ese/hsigmoid/Div), /backbone/stage5/OSA5_1/ese/Mul) 4.61 0.1153 0.1
[04/08/2024-16:54:21] [I] /upsample_2x/Resize 8.60 0.2149 0.3
[04/08/2024-16:54:21] [I] /latlayer3/Conv 14.85 0.3712 0.5
[04/08/2024-16:54:21] [I] /avg_pool_4x/AveragePool_output_0 copy 1.24 0.0311 0.0
[04/08/2024-16:54:21] [I] /SPP/Conv1x1/conv1x1/conv1x1.0/Conv + /SPP/Conv1x1/conv1x1/conv1x1.2/Relu 13.27 0.3318 0.4
[04/08/2024-16:54:21] [I] /SPP/S1/S1.0/Conv + /SPP/S1/S1.2/Relu 9.90 0.2475 0.3
[04/08/2024-16:54:21] [I] /SPP/S2/S2.0/Conv + /SPP/S2/S2.2/Relu 9.83 0.2458 0.3
[04/08/2024-16:54:21] [I] /SPP/S3/S3.0/Conv + /SPP/S3/S3.2/Relu 10.58 0.2645 0.3
[04/08/2024-16:54:21] [I] /SPP/S2/S2.3/Conv + /SPP/S2/S2.5/Relu 9.82 0.2455 0.3
[04/08/2024-16:54:21] [I] /SPP/S3/S3.3/Conv + /SPP/S3/S3.5/Relu 9.80 0.2449 0.3
[04/08/2024-16:54:21] [I] /SPP/S3/S3.6/Conv + /SPP/S3/S3.8/Relu 9.75 0.2438 0.3
[04/08/2024-16:54:21] [I] /SPP/output/output.0/Conv + /SPP/Add + /SPP/relu/Relu 8.73 0.2184 0.3
[04/08/2024-16:54:21] [I] /detect_head/conv1x1/conv1x1/conv1x1.0/Conv + /detect_head/conv1x1/conv1x1/conv1x1.2/Relu 3.50 0.0876 0.1
[04/08/2024-16:54:21] [I] /detect_head/obj_layers/conv5x5/conv5x5.0/Conv + /detect_head/obj_layers/conv5x5/conv5x5.2/Relu 9.75 0.2437 0.3
[04/08/2024-16:54:21] [I] /detect_head/reg_layers/conv5x5/conv5x5.0/Conv + /detect_head/reg_layers/conv5x5/conv5x5.2/Relu 9.79 0.2447 0.3
[04/08/2024-16:54:21] [I] /detect_head/cls_layers/conv5x5/conv5x5.0/Conv + /detect_head/cls_layers/conv5x5/conv5x5.2/Relu 9.75 0.2438 0.3
[04/08/2024-16:54:21] [I] /detect_head/obj_layers/conv5x5/conv5x5.3/Conv 1.73 0.0433 0.1
[04/08/2024-16:54:21] [I] /detect_head/reg_layers/conv5x5/conv5x5.3/Conv 1.84 0.0460 0.1
[04/08/2024-16:54:21] [I] /detect_head/cls_layers/conv5x5/conv5x5.3/Conv 1.47 0.0367 0.0
[04/08/2024-16:54:21] [I] PWN(/detect_head/sigmoid/Sigmoid) 0.21 0.0053 0.0
[04/08/2024-16:54:21] [I] /detect_head/softmax/Transpose + (Unnamed Layer* 136) [Shuffle] 0.33 0.0082 0.0
[04/08/2024-16:54:21] [I] /detect_head/softmax/Softmax 1.18 0.0295 0.0
[04/08/2024-16:54:21] [I] (Unnamed Layer* 138) [Shuffle] + /detect_head/softmax/Transpose_1 0.31 0.0078 0.0
[04/08/2024-16:54:21] [I] /detect_head/sigmoid/Sigmoid_output_0 copy 0.15 0.0038 0.0
[04/08/2024-16:54:21] [I] Total 3250.31 81.2577 100.0
[04/08/2024-16:54:21] [I]
and I notice the differences are
/backbone/stage3/OSA3_1/ese/fc/Conv + /backbone/stage3/OSA3_1/ese/hsigmoid/Relu + PWN(/backbone/stage3/OSA3_1/ese/hsigmoid/Clip) 1180.69 29.5172 36.3
and
/latlayer3/Conv 425.98 0.2205 14.7
I’m sure they both share the same onnx file and the same codes to trans to trt engine file.
any help will be appropriated.