Hi,
I’m trying to understand what takes most of the time in a network running on the DLA.
Below is trtexec’s dump for this network, running on the DLA (with some layers fallback to the GPU):
More specificaly, what are those “output reformatter XX” lines? reformatting input/output or the actual layers running on the DLA (and as such can not be optimized)?
77.3% + 17.1% → amounts for most of the time
Blockquote
[02/01/2021-18:55:10] [I] ResNet18v2/conv0/Conv2D__27 output reformatter 0 9.14 0.12 0.3
[02/01/2021-18:55:10] [I] {ResNet18v2/conv0/Conv2D,ResNet18v2/Relu,ResNet18v2/stage1_maxpool,ResNet18v2/group0/block0/conv1/Conv2D,ResNet18v2/Relu_1,ResNet18v2/group0/block0/conv2/Conv2D,ResNet18v2/stage2_add1,ResNet18v2/stage2_relu1,ResNet18v2/group0/block1/conv1/Conv2D,ResNet18v2/Relu_2,ResNet18v2/group0/block1/conv2/Conv2D,ResNet18v2/stage2_add2,ResNet18v2/stage2_relu2,ResNet18v2/group1/block0/convshortcut/Conv2D,ResNet18v2/group1/block0/conv1/Conv2D,ResNet18v2/Relu_3,ResNet18v2/group1/block0/conv2/Conv2D,ResNet18v2/stage3_add1,ResNet18v2/stage3_relu1,ResNet18v2/group1/block1/conv1/Conv2D,ResNet18v2/Relu_4,ResNet18v2/group1/block1/conv2/Conv2D,ResNet18v2/stage3_add2,ResNet18v2/stage3_relu2,ResNet18v2/group2/block0/convshortcut/Conv2D,ResNet18v2/group2/block0/conv1/Conv2D,ResNet18v2/Relu_5,ResNet18v2/group2/block0/conv2/Conv2D,ResNet18v2/stage4_add1,ResNet18v2/stage4_relu1,ResNet18v2/group2/block1/conv1/Conv2D,ResNet18v2/Relu_6,ResNet18v2/group2/block1/conv2/Conv2D,ResNet18v2/stage4_add2,ResNet18v2/stage4_relu2,ResNet18v2/group3/block0/convshortcut/Conv2D,ResNet18v2/group3/block0/conv1/Conv2D,ResNet18v2/Relu_7,ResNet18v2/group3/block0/conv2/Conv2D,ResNet18v2/stage5_add1,ResNet18v2/stage5_relu1,ResNet18v2/group3/block1/conv1/Conv2D,ResNet18v2/Relu_8,ResNet18v2/group3/block1/conv2/Conv2D,ResNet18v2/stage5_add2,ResNet18v2/stage5_relu2,ResNet18v2/conv_extras1_1x1_s1/Conv2D,ResNet18v2/Relu_9,ResNet18v2/conv_extras2_3x3_s1/Conv2D,ResNet18v2/Relu_10,ResNet18v2/conv_extras3_1x1_s1/Conv2D,ResNet18v2/Relu_11,ResNet18v2/conv_extras4_3x3_s2/Conv2D,ResNet18v2/Relu_12,ResNet18v2/conv_ff3_1x1_s1/Conv2D,ResNet18v2/Relu_15,ResNet18v2/conv_ff2_1x1_s1/Conv2D,ResNet18v2/conv_ff2_1x1_s1_bn/FusedBatchNormV3,ResNet18v2/Relu_14,ResNet18v2/conv_ff1_1x1_s1/Conv2D,ResNet18v2/conv_ff1_1x1_s1_bn/FusedBatchNormV3,ResNet18v2/Relu_13} 50.41 0.64 1.6
[02/01/2021-18:55:10] [I] ResNet18v2/conv0/Conv2D__27:0 finish 4.90 0.06 0.2
[02/01/2021-18:55:10] [I] {ResNet18v2/conv0/Conv2D,ResNet18v2/Relu,ResNet18v2/stage1_maxpool,ResNet18v2/group0/block0/conv1/Conv2D,ResNet18v2/Relu_1,ResNet18v2/group0/block0/conv2/Conv2D,ResNet18v2/stage2_add1,ResNet18v2/stage2_relu1,ResNet18v2/group0/block1/conv1/Conv2D,ResNet18v2/Relu_2,ResNet18v2/group0/block1/conv2/Conv2D,ResNet18v2/stage2_add2,ResNet18v2/stage2_relu2,ResNet18v2/group1/block0/convshortcut/Conv2D,ResNet18v2/group1/block0/conv1/Conv2D,ResNet18v2/Relu_3,ResNet18v2/group1/block0/conv2/Conv2D,ResNet18v2/stage3_add1,ResNet18v2/stage3_relu1,ResNet18v2/group1/block1/conv1/Conv2D,ResNet18v2/Relu_4,ResNet18v2/group1/block1/conv2/Conv2D,ResNet18v2/stage3_add2,ResNet18v2/stage3_relu2,ResNet18v2/group2/block0/convshortcut/Conv2D,ResNet18v2/group2/block0/conv1/Conv2D,ResNet18v2/Relu_5,ResNet18v2/group2/block0/conv2/Conv2D,ResNet18v2/stage4_add1,ResNet18v2/stage4_relu1,ResNet18v2/group2/block1/conv1/Conv2D,ResNet18v2/Relu_6,ResNet18v2/group2/block1/conv2/Conv2D,ResNet18v2/stage4_add2,ResNet18v2/stage4_relu2,ResNet18v2/group3/block0/convshortcut/Conv2D,ResNet18v2/group3/block0/conv1/Conv2D,ResNet18v2/Relu_7,ResNet18v2/group3/block0/conv2/Conv2D,ResNet18v2/stage5_add1,ResNet18v2/stage5_relu1,ResNet18v2/group3/block1/conv1/Conv2D,ResNet18v2/Relu_8,ResNet18v2/group3/block1/conv2/Conv2D,ResNet18v2/stage5_add2,ResNet18v2/stage5_relu2,ResNet18v2/conv_extras1_1x1_s1/Conv2D,ResNet18v2/Relu_9,ResNet18v2/conv_extras2_3x3_s1/Conv2D,ResNet18v2/Relu_10,ResNet18v2/conv_extras3_1x1_s1/Conv2D,ResNet18v2/Relu_11,ResNet18v2/conv_extras4_3x3_s2/Conv2D,ResNet18v2/Relu_12,ResNet18v2/conv_ff3_1x1_s1/Conv2D,ResNet18v2/Relu_15,ResNet18v2/conv_ff2_1x1_s1/Conv2D,ResNet18v2/conv_ff2_1x1_s1_bn/FusedBatchNormV3,ResNet18v2/Relu_14,ResNet18v2/conv_ff1_1x1_s1/Conv2D,ResNet18v2/conv_ff1_1x1_s1_bn/FusedBatchNormV3,ResNet18v2/Relu_13} output reformatter 1 2504.86 31.71 77.3
[02/01/2021-18:55:10] [I] {ResNet18v2/conv0/Conv2D,ResNet18v2/Relu,ResNet18v2/stage1_maxpool,ResNet18v2/group0/block0/conv1/Conv2D,ResNet18v2/Relu_1,ResNet18v2/group0/block0/conv2/Conv2D,ResNet18v2/stage2_add1,ResNet18v2/stage2_relu1,ResNet18v2/group0/block1/conv1/Conv2D,ResNet18v2/Relu_2,ResNet18v2/group0/block1/conv2/Conv2D,ResNet18v2/stage2_add2,ResNet18v2/stage2_relu2,ResNet18v2/group1/block0/convshortcut/Conv2D,ResNet18v2/group1/block0/conv1/Conv2D,ResNet18v2/Relu_3,ResNet18v2/group1/block0/conv2/Conv2D,ResNet18v2/stage3_add1,ResNet18v2/stage3_relu1,ResNet18v2/group1/block1/conv1/Conv2D,ResNet18v2/Relu_4,ResNet18v2/group1/block1/conv2/Conv2D,ResNet18v2/stage3_add2,ResNet18v2/stage3_relu2,ResNet18v2/group2/block0/convshortcut/Conv2D,ResNet18v2/group2/block0/conv1/Conv2D,ResNet18v2/Relu_5,ResNet18v2/group2/block0/conv2/Conv2D,ResNet18v2/stage4_add1,ResNet18v2/stage4_relu1,ResNet18v2/group2/block1/conv1/Conv2D,ResNet18v2/Relu_6,ResNet18v2/group2/block1/conv2/Conv2D,ResNet18v2/stage4_add2,ResNet18v2/stage4_relu2,ResNet18v2/group3/block0/convshortcut/Conv2D,ResNet18v2/group3/block0/conv1/Conv2D,ResNet18v2/Relu_7,ResNet18v2/group3/block0/conv2/Conv2D,ResNet18v2/stage5_add1,ResNet18v2/stage5_relu1,ResNet18v2/group3/block1/conv1/Conv2D,ResNet18v2/Relu_8,ResNet18v2/group3/block1/conv2/Conv2D,ResNet18v2/stage5_add2,ResNet18v2/stage5_relu2,ResNet18v2/conv_extras1_1x1_s1/Conv2D,ResNet18v2/Relu_9,ResNet18v2/conv_extras2_3x3_s1/Conv2D,ResNet18v2/Relu_10,ResNet18v2/conv_extras3_1x1_s1/Conv2D,ResNet18v2/Relu_11,ResNet18v2/conv_extras4_3x3_s2/Conv2D,ResNet18v2/Relu_12,ResNet18v2/conv_ff3_1x1_s1/Conv2D,ResNet18v2/Relu_15,ResNet18v2/conv_ff2_1x1_s1/Conv2D,ResNet18v2/conv_ff2_1x1_s1_bn/FusedBatchNormV3,ResNet18v2/Relu_14,ResNet18v2/conv_ff1_1x1_s1/Conv2D,ResNet18v2/conv_ff1_1x1_s1_bn/FusedBatchNormV3,ResNet18v2/Relu_13} output to be reformatted 1 finish 0.46 0.01 0.0
[02/01/2021-18:55:10] [I] {ResNet18v2/conv_pfe1_3x3_s1/Conv2D,ResNet18v2/Relu_16,ResNet18v2/conv_pfe2_3x3_s2/Conv2D,ResNet18v2/Relu_17,ResNet18v2/conv_pfe3_3x3_s2/Conv2D,ResNet18v2/Relu_18,ResNet18v2/conv_pfe4_3x3_s2/Conv2D,ResNet18v2/Relu_19,ResNet18v2/conv_pfe5_3x3_s2/Conv2D,ResNet18v2/Relu_20,ResNet18v2/conv_pfe6_3x3_s2/Conv2D,ResNet18v2/Relu_21,ResNet18v2/conv_loc5_3x3_s1/Conv2D,ResNet18v2/conv_conf5_3x3_s1/Conv2D,ResNet18v2/conv_loc4_3x3_s1/Conv2D,ResNet18v2/conv_conf4_3x3_s1/Conv2D,ResNet18v2/conv_loc3_3x3_s1/Conv2D,ResNet18v2/conv_conf3_3x3_s1/Conv2D,ResNet18v2/conv_loc2_3x3_s1/Conv2D,ResNet18v2/conv_conf2_3x3_s1/Conv2D,ResNet18v2/conv_loc1_3x3_s1/Conv2D,ResNet18v2/conv_conf1_3x3_s1/Conv2D,ResNet18v2/conv_loc0_3x3_s1/Conv2D,ResNet18v2/conv_conf0_3x3_s1/Conv2D} 0.17 0.00 0.0
[02/01/2021-18:55:10] [I] ResNet18v2/feature_transform_module:0 finish 0.05 0.00 0.0
[02/01/2021-18:55:10] [I] {ResNet18v2/conv_pfe1_3x3_s1/Conv2D,ResNet18v2/Relu_16,ResNet18v2/conv_pfe2_3x3_s2/Conv2D,ResNet18v2/Relu_17,ResNet18v2/conv_pfe3_3x3_s2/Conv2D,ResNet18v2/Relu_18,ResNet18v2/conv_pfe4_3x3_s2/Conv2D,ResNet18v2/Relu_19,ResNet18v2/conv_pfe5_3x3_s2/Conv2D,ResNet18v2/Relu_20,ResNet18v2/conv_pfe6_3x3_s2/Conv2D,ResNet18v2/Relu_21,ResNet18v2/conv_loc5_3x3_s1/Conv2D,ResNet18v2/conv_conf5_3x3_s1/Conv2D,ResNet18v2/conv_loc4_3x3_s1/Conv2D,ResNet18v2/conv_conf4_3x3_s1/Conv2D,ResNet18v2/conv_loc3_3x3_s1/Conv2D,ResNet18v2/conv_conf3_3x3_s1/Conv2D,ResNet18v2/conv_loc2_3x3_s1/Conv2D,ResNet18v2/conv_conf2_3x3_s1/Conv2D,ResNet18v2/conv_loc1_3x3_s1/Conv2D,ResNet18v2/conv_conf1_3x3_s1/Conv2D,ResNet18v2/conv_loc0_3x3_s1/Conv2D,ResNet18v2/conv_conf0_3x3_s1/Conv2D} output reformatter 8 553.06 7.00 17.1
[02/01/2021-18:55:10] [I] {ResNet18v2/conv_pfe1_3x3_s1/Conv2D,ResNet18v2/Relu_16,ResNet18v2/conv_pfe2_3x3_s2/Conv2D,ResNet18v2/Relu_17,ResNet18v2/conv_pfe3_3x3_s2/Conv2D,ResNet18v2/Relu_18,ResNet18v2/conv_pfe4_3x3_s2/Conv2D,ResNet18v2/Relu_19,ResNet18v2/conv_pfe5_3x3_s2/Conv2D,ResNet18v2/Relu_20,ResNet18v2/conv_pfe6_3x3_s2/Conv2D,ResNet18v2/Relu_21,ResNet18v2/conv_loc5_3x3_s1/Conv2D,ResNet18v2/conv_conf5_3x3_s1/Conv2D,ResNet18v2/conv_loc4_3x3_s1/Conv2D,ResNet18v2/conv_conf4_3x3_s1/Conv2D,ResNet18v2/conv_loc3_3x3_s1/Conv2D,ResNet18v2/conv_conf3_3x3_s1/Conv2D,ResNet18v2/conv_loc2_3x3_s1/Conv2D,ResNet18v2/conv_conf2_3x3_s1/Conv2D,ResNet18v2/conv_loc1_3x3_s1/Conv2D,ResNet18v2/conv_conf1_3x3_s1/Conv2D,ResNet18v2/conv_loc0_3x3_s1/Conv2D,ResNet18v2/conv_conf0_3x3_s1/Conv2D} output to be reformatted 8 finish 0.42 0.01 0.0
[02/01/2021-18:55:10] [I] {ResNet18v2/conv_pfe1_3x3_s1/Conv2D,ResNet18v2/Relu_16,ResNet18v2/conv_pfe2_3x3_s2/Conv2D,ResNet18v2/Relu_17,ResNet18v2/conv_pfe3_3x3_s2/Conv2D,ResNet18v2/Relu_18,ResNet18v2/conv_pfe4_3x3_s2/Conv2D,ResNet18v2/Relu_19,ResNet18v2/conv_pfe5_3x3_s2/Conv2D,ResNet18v2/Relu_20,ResNet18v2/conv_pfe6_3x3_s2/Conv2D,ResNet18v2/Relu_21,ResNet18v2/conv_loc5_3x3_s1/Conv2D,ResNet18v2/conv_conf5_3x3_s1/Conv2D,ResNet18v2/conv_loc4_3x3_s1/Conv2D,ResNet18v2/conv_conf4_3x3_s1/Conv2D,ResNet18v2/conv_loc3_3x3_s1/Conv2D,ResNet18v2/conv_conf3_3x3_s1/Conv2D,ResNet18v2/conv_loc2_3x3_s1/Conv2D,ResNet18v2/conv_conf2_3x3_s1/Conv2D,ResNet18v2/conv_loc1_3x3_s1/Conv2D,ResNet18v2/conv_conf1_3x3_s1/Conv2D,ResNet18v2/conv_loc0_3x3_s1/Conv2D,ResNet18v2/conv_conf0_3x3_s1/Conv2D} output reformatter 5 0.50 0.01 0.0
[02/01/2021-18:55:10] [I] {ResNet18v2/conv_pfe1_3x3_s1/Conv2D,ResNet18v2/Relu_16,ResNet18v2/conv_pfe2_3x3_s2/Conv2D,ResNet18v2/Relu_17,ResNet18v2/conv_pfe3_3x3_s2/Conv2D,ResNet18v2/Relu_18,ResNet18v2/conv_pfe4_3x3_s2/Conv2D,ResNet18v2/Relu_19,ResNet18v2/conv_pfe5_3x3_s2/Conv2D,ResNet18v2/Relu_20,ResNet18v2/conv_pfe6_3x3_s2/Conv2D,ResNet18v2/Relu_21,ResNet18v2/conv_loc5_3x3_s1/Conv2D,ResNet18v2/conv_conf5_3x3_s1/Conv2D,ResNet18v2/conv_loc4_3x3_s1/Conv2D,ResNet18v2/conv_conf4_3x3_s1/Conv2D,ResNet18v2/conv_loc3_3x3_s1/Conv2D,ResNet18v2/conv_conf3_3x3_s1/Conv2D,ResNet18v2/conv_loc2_3x3_s1/Conv2D,ResNet18v2/conv_conf2_3x3_s1/Conv2D,ResNet18v2/conv_loc1_3x3_s1/Conv2D,ResNet18v2/conv_conf1_3x3_s1/Conv2D,ResNet18v2/conv_loc0_3x3_s1/Conv2D,ResNet18v2/conv_conf0_3x3_s1/Conv2D} output to be reformatted 5 finish 0.17 0.00 0.0
[02/01/2021-18:55:10] [I] {ResNet18v2/conv_pfe1_3x3_s1/Conv2D,ResNet18v2/Relu_16,ResNet18v2/conv_pfe2_3x3_s2/Conv2D,ResNet18v2/Relu_17,ResNet18v2/conv_pfe3_3x3_s2/Conv2D,ResNet18v2/Relu_18,ResNet18v2/conv_pfe4_3x3_s2/Conv2D,ResNet18v2/Relu_19,ResNet18v2/conv_pfe5_3x3_s2/Conv2D,ResNet18v2/Relu_20,ResNet18v2/conv_pfe6_3x3_s2/Conv2D,ResNet18v2/Relu_21,ResNet18v2/conv_loc5_3x3_s1/Conv2D,ResNet18v2/conv_conf5_3x3_s1/Conv2D,ResNet18v2/conv_loc4_3x3_s1/Conv2D,ResNet18v2/conv_conf4_3x3_s1/Conv2D,ResNet18v2/conv_loc3_3x3_s1/Conv2D,ResNet18v2/conv_conf3_3x3_s1/Conv2D,ResNet18v2/conv_loc2_3x3_s1/Conv2D,ResNet18v2/conv_conf2_3x3_s1/Conv2D,ResNet18v2/conv_loc1_3x3_s1/Conv2D,ResNet18v2/conv_conf1_3x3_s1/Conv2D,ResNet18v2/conv_loc0_3x3_s1/Conv2D,ResNet18v2/conv_conf0_3x3_s1/Conv2D} output reformatter 1 0.39 0.00 0.0
[02/01/2021-18:55:10] [I] {ResNet18v2/conv_pfe1_3x3_s1/Conv2D,ResNet18v2/Relu_16,ResNet18v2/conv_pfe2_3x3_s2/Conv2D,ResNet18v2/Relu_17,ResNet18v2/conv_pfe3_3x3_s2/Conv2D,ResNet18v2/Relu_18,ResNet18v2/conv_pfe4_3x3_s2/Conv2D,ResNet18v2/Relu_19,ResNet18v2/conv_pfe5_3x3_s2/Conv2D,ResNet18v2/Relu_20,ResNet18v2/conv_pfe6_3x3_s2/Conv2D,ResNet18v2/Relu_21,ResNet18v2/conv_loc5_3x3_s1/Conv2D,ResNet18v2/conv_conf5_3x3_s1/Conv2D,ResNet18v2/conv_loc4_3x3_s1/Conv2D,ResNet18v2/conv_conf4_3x3_s1/Conv2D,ResNet18v2/conv_loc3_3x3_s1/Conv2D,ResNet18v2/conv_conf3_3x3_s1/Conv2D,ResNet18v2/conv_loc2_3x3_s1/Conv2D,ResNet18v2/conv_conf2_3x3_s1/Conv2D,ResNet18v2/conv_loc1_3x3_s1/Conv2D,ResNet18v2/conv_conf1_3x3_s1/Conv2D,ResNet18v2/conv_loc0_3x3_s1/Conv2D,ResNet18v2/conv_conf0_3x3_s1/Conv2D} output to be reformatted 1 finish 0.17 0.00 0.0
Blockquote
thanks
Eyal