I have used TensorRT to implement the forward reasoning of yolo v2 on TX1 with fp32 mode, and written a plugin for reorg layer and region layer,but when I exchange FP16 mode,and batchsize is set to 2, the second image 's calculation results is 0。Here is partial output log。
useFp16: 1
Original: 157 layers
After dead-layer removal: 157 layers
Fusing convolution weights from conv1 with scale bn1
Fusing convolution weights from conv1 with scale scale1
Fusing convolution weights from conv2 with scale bn2
Fusing convolution weights from conv2 with scale scale2
Fusing convolution weights from conv3 with scale bn3
Fusing convolution weights from conv3 with scale scale3
Fusing convolution weights from conv4 with scale bn4
Fusing convolution weights from conv4 with scale scale4
Fusing convolution weights from conv5 with scale bn5
Fusing convolution weights from conv5 with scale scale5
Fusing convolution weights from conv6 with scale bn6
Fusing convolution weights from conv6 with scale scale6
Fusing convolution weights from conv7 with scale bn7
Fusing convolution weights from conv7 with scale scale7
Fusing convolution weights from conv8 with scale bn8
Fusing convolution weights from conv8 with scale scale8
Fusing convolution weights from conv9 with scale bn9
Fusing convolution weights from conv9 with scale scale9
Fusing convolution weights from conv10 with scale bn10
Fusing convolution weights from conv10 with scale scale10
Fusing convolution weights from conv11 with scale bn11
Fusing convolution weights from conv11 with scale scale11
Fusing convolution weights from conv12 with scale bn12
Fusing convolution weights from conv12 with scale scale12
Fusing convolution weights from conv13 with scale bn13
Fusing convolution weights from conv13 with scale scale13
Fusing convolution weights from conv14 with scale bn14
Fusing convolution weights from conv14 with scale scale14
Fusing convolution weights from conv15 with scale bn15
Fusing convolution weights from conv15 with scale scale15
Fusing convolution weights from conv16 with scale bn16
Fusing convolution weights from conv16 with scale scale16
Fusing convolution weights from conv17 with scale bn17
Fusing convolution weights from conv17 with scale scale17
Fusing convolution weights from conv18 with scale bn18
Fusing convolution weights from conv18 with scale scale18
Fusing convolution weights from conv19 with scale bn19
Fusing convolution weights from conv19 with scale scale19
Fusing convolution weights from conv20 with scale bn20
Fusing convolution weights from conv20 with scale scale20
Fusing convolution weights from conv21 with scale bn21
Fusing convolution weights from conv21 with scale scale21
After scale fusion: 115 layers
After conv-act fusion: 115 layers
After tensor merging: 115 layers
Eliminating contatenation concat1
Generating copy for layer13 to concat1
Eliminating contatenation concat2
Generating copy for reorg to concat2
Generating copy for layer20 to concat2
After concat removal: 116 layers
Region scale1: NC2HW_F16
Region scale1_power1: NC2HW_F16
Region relu1: NC2HW_F16
Region scale1_power2: NC2HW_F16
Region layer1: NC2HW_F16
Region pool1: NC2HW_F16
Region scale2: NC2HW_F16
Region scale2_power1: NC2HW_F16
Region relu2: NC2HW_F16
Region scale2_power2: NC2HW_F16
Region layer2: NC2HW_F16
Region pool2: NC2HW_F16
Region scale3: NC2HW_F16
Region scale3_power1: NC2HW_F16
Region relu3: NC2HW_F16
Region scale3_power2: NC2HW_F16
Region layer3: NC2HW_F16
Region scale4: NC2HW_F16
Region scale4_power1: NC2HW_F16
Region relu4: NC2HW_F16
Region scale4_power2: NC2HW_F16
Region layer4: NC2HW_F16
Region scale5: NC2HW_F16
Region scale5_power1: NC2HW_F16
Region relu5: NC2HW_F16
Region scale5_power2: NC2HW_F16
Region layer5: NC2HW_F16
Region pool5: NC2HW_F16
Region scale6: NC2HW_F16
Region scale6_power1: NC2HW_F16
Region relu6: NC2HW_F16
Region scale6_power2: NC2HW_F16
Region layer6: NC2HW_F16
Region scale7: NC2HW_F16
Region scale7_power1: NC2HW_F16
Region relu7: NC2HW_F16
Region scale7_power2: NC2HW_F16
Region layer7: NC2HW_F16
Region scale8: NC2HW_F16
Region scale8_power1: NC2HW_F16
Region relu8: NC2HW_F16
Region scale8_power2: NC2HW_F16
Region layer8: NC2HW_F16
Region pool8: NC2HW_F16
Region scale9: NC2HW_F16
Region scale9_power1: NC2HW_F16
Region relu9: NC2HW_F16
Region scale9_power2: NC2HW_F16
Region layer9: NC2HW_F16
Region scale10: NC2HW_F16
Region scale10_power1: NC2HW_F16
Region relu10: NC2HW_F16
Region scale10_power2: NC2HW_F16
Region layer10: NC2HW_F16
Region scale11: NC2HW_F16
Region scale11_power1: NC2HW_F16
Region relu11: NC2HW_F16
Region scale11_power2: NC2HW_F16
Region layer11: NC2HW_F16
Region scale12: NC2HW_F16
Region scale12_power1: NC2HW_F16
Region relu12: NC2HW_F16
Region scale12_power2: NC2HW_F16
Region layer12: NC2HW_F16
Region scale13: NC2HW_F16
Region scale13_power1: NC2HW_F16
Region relu13: NC2HW_F16
Region scale13_power2: NC2HW_F16
Region layer13: NC2HW_F16
Region pool13: NC2HW_F16
Region scale14: NC2HW_F16
Region scale14_power1: NC2HW_F16
Region relu14: NC2HW_F16
Region scale14_power2: NC2HW_F16
Region layer14: NC2HW_F16
Region scale15: NC2HW_F16
Region scale15_power1: NC2HW_F16
Region relu15: NC2HW_F16
Region scale15_power2: NC2HW_F16
Region layer15: NC2HW_F16
Region scale16: NC2HW_F16
Region scale16_power1: NC2HW_F16
Region relu16: NC2HW_F16
Region scale16_power2: NC2HW_F16
Region layer16: NC2HW_F16
Region scale17: NC2HW_F16
Region scale17_power1: NC2HW_F16
Region relu17: NC2HW_F16
Region scale17_power2: NC2HW_F16
Region layer17: NC2HW_F16
Region scale18: NC2HW_F16
Region scale18_power1: NC2HW_F16
Region relu18: NC2HW_F16
Region scale18_power2: NC2HW_F16
Region layer18: NC2HW_F16
Region scale19: NC2HW_F16
Region scale19_power1: NC2HW_F16
Region relu19: NC2HW_F16
Region scale19_power2: NC2HW_F16
Region layer19: NC2HW_F16
Region scale20: NC2HW_F16
Region scale20_power1: NC2HW_F16
Region relu20: NC2HW_F16
Region scale20_power2: NC2HW_F16
Region layer20: NC2HW_F16
Region concat1: NC2HW_F16
Region reorg: NC2HW_F16
Region concat2: NC2HW_F16
Region scale21: NC2HW_F16
Region scale21_power1: NC2HW_F16
Region relu21: NC2HW_F16
Region scale21_power2: NC2HW_F16
Region layer21: NC2HW_F16
Region conv22: NC2HW_F16
Region data: NC2HW_F16
Region scale1: NC2HW_F16
Region scale1_power1: NC2HW_F16
Region relu1: NC2HW_F16
Region scale1_power2: NC2HW_F16
Region layer1: NC2HW_F16
Region pool1: NC2HW_F16
Region scale2: NC2HW_F16
Region scale2_power1: NC2HW_F16
Region relu2: NC2HW_F16
Region scale2_power2: NC2HW_F16
Region layer2: NC2HW_F16
Region pool2: NC2HW_F16
Region scale3: NC2HW_F16
Region scale3_power1: NC2HW_F16
Region relu3: NC2HW_F16
Region scale3_power2: NC2HW_F16
Region layer3: NC2HW_F16
Region scale4: NC2HW_F16
Region scale4_power1: NC2HW_F16
Region relu4: NC2HW_F16
Region scale4_power2: NC2HW_F16
Region layer4: NC2HW_F16
Region scale5: NC2HW_F16
Region scale5_power1: NC2HW_F16
Region relu5: NC2HW_F16
Region scale5_power2: NC2HW_F16
Region layer5: NC2HW_F16
Region pool5: NC2HW_F16
Region scale6: NC2HW_F16
Region scale6_power1: NC2HW_F16
Region relu6: NC2HW_F16
Region scale6_power2: NC2HW_F16
Region layer6: NC2HW_F16
Region scale7: NC2HW_F16
Region scale7_power1: NC2HW_F16
Region relu7: NC2HW_F16
Region scale7_power2: NC2HW_F16
Region layer7: NC2HW_F16
Region scale8: NC2HW_F16
Region scale8_power1: NC2HW_F16
Region relu8: NC2HW_F16
Region scale8_power2: NC2HW_F16
Region layer8: NC2HW_F16
Region pool8: NC2HW_F16
Region scale9: NC2HW_F16
Region scale9_power1: NC2HW_F16
Region relu9: NC2HW_F16
Region scale9_power2: NC2HW_F16
Region layer9: NC2HW_F16
Region scale10: NC2HW_F16
Region scale10_power1: NC2HW_F16
Region relu10: NC2HW_F16
Region scale10_power2: NC2HW_F16
Region layer10: NC2HW_F16
Region scale11: NC2HW_F16
Region scale11_power1: NC2HW_F16
Region relu11: NC2HW_F16
Region scale11_power2: NC2HW_F16
Region layer11: NC2HW_F16
Region scale12: NC2HW_F16
Region scale12_power1: NC2HW_F16
Region relu12: NC2HW_F16
Region scale12_power2: NC2HW_F16
Region layer12: NC2HW_F16
Region scale13: NC2HW_F16
Region scale13_power1: NC2HW_F16
Region relu13: NC2HW_F16
Region scale13_power2: NC2HW_F16
Region layer13: NC2HW_F16
Region pool13: NC2HW_F16
Region scale14: NC2HW_F16
Region scale14_power1: NC2HW_F16
Region relu14: NC2HW_F16
Region scale14_power2: NC2HW_F16
Region layer14: NC2HW_F16
Region scale15: NC2HW_F16
Region scale15_power1: NC2HW_F16
Region relu15: NC2HW_F16
Region scale15_power2: NC2HW_F16
Region layer15: NC2HW_F16
Region scale16: NC2HW_F16
Region scale16_power1: NC2HW_F16
Region relu16: NC2HW_F16
Region scale16_power2: NC2HW_F16
Region layer16: NC2HW_F16
Region scale17: NC2HW_F16
Region scale17_power1: NC2HW_F16
Region relu17: NC2HW_F16
Region scale17_power2: NC2HW_F16
Region layer17: NC2HW_F16
Region scale18: NC2HW_F16
Region scale18_power1: NC2HW_F16
Region relu18: NC2HW_F16
Region scale18_power2: NC2HW_F16
Region layer18: NC2HW_F16
Region scale19: NC2HW_F16
Region scale19_power1: NC2HW_F16
Region relu19: NC2HW_F16
Region scale19_power2: NC2HW_F16
Region layer19: NC2HW_F16
Region scale20: NC2HW_F16
Region scale20_power1: NC2HW_F16
Region relu20: NC2HW_F16
Region scale20_power2: NC2HW_F16
Region layer20: NC2HW_F16
Region concat1: NC2HW_F16
Region reorg: NC2HW_F16
Region concat2: NC2HW_F16
Region scale21: NC2HW_F16
Region scale21_power1: NC2HW_F16
Region relu21: NC2HW_F16
Region scale21_power2: NC2HW_F16
Region layer21: NC2HW_F16
Region conv22: NC2HW_F16
Region region: NCHW_F32
Node conv1: NC2HW_F16
Node scale1_power1: NC2HW_F16
Node relu1: NC2HW_F16
Node scale1_power2: NC2HW_F16
Node eltwise1: NC2HW_F16
Node pool1: NC2HW_F16
Node conv2: NC2HW_F16
Node scale2_power1: NC2HW_F16
Node relu2: NC2HW_F16
Node scale2_power2: NC2HW_F16
Node eltwise2: NC2HW_F16
Node pool2: NC2HW_F16
Node conv3: NC2HW_F16
Node scale3_power1: NC2HW_F16
Node relu3: NC2HW_F16
Node scale3_power2: NC2HW_F16
Node eltwise3: NC2HW_F16
Node conv4: NC2HW_F16
Node scale4_power1: NC2HW_F16
Node relu4: NC2HW_F16
Node scale4_power2: NC2HW_F16
Node eltwise4: NC2HW_F16
Node conv5: NC2HW_F16
Node scale5_power1: NC2HW_F16
Node relu5: NC2HW_F16
Node scale5_power2: NC2HW_F16
Node eltwise5: NC2HW_F16
Node pool5: NC2HW_F16
Node conv6: NC2HW_F16
Node scale6_power1: NC2HW_F16
Node relu6: NC2HW_F16
Node scale6_power2: NC2HW_F16
Node eltwise6: NC2HW_F16
Node conv7: NC2HW_F16
Node scale7_power1: NC2HW_F16
Node relu7: NC2HW_F16
Node scale7_power2: NC2HW_F16
Node eltwise7: NC2HW_F16
Node conv8: NC2HW_F16
Node scale8_power1: NC2HW_F16
Node relu8: NC2HW_F16
Node scale8_power2: NC2HW_F16
Node eltwise8: NC2HW_F16
Node pool8: NC2HW_F16
Node conv9: NC2HW_F16
Node scale9_power1: NC2HW_F16
Node relu9: NC2HW_F16
Node scale9_power2: NC2HW_F16
Node eltwise9: NC2HW_F16
Node conv10: NC2HW_F16
Node scale10_power1: NC2HW_F16
Node relu10: NC2HW_F16
Node scale10_power2: NC2HW_F16
Node eltwise10: NC2HW_F16
Node conv11: NC2HW_F16
Node scale11_power1: NC2HW_F16
Node relu11: NC2HW_F16
Node scale11_power2: NC2HW_F16
Node eltwise11: NC2HW_F16
Node conv12: NC2HW_F16
Node scale12_power1: NC2HW_F16
Node relu12: NC2HW_F16
Node scale12_power2: NC2HW_F16
Node eltwise12: NC2HW_F16
Node conv13: NC2HW_F16
Node scale13_power1: NC2HW_F16
Node relu13: NC2HW_F16
Node scale13_power2: NC2HW_F16
Node eltwise13: NC2HW_F16
Node pool13: NC2HW_F16
Node conv14: NC2HW_F16
Node scale14_power1: NC2HW_F16
Node relu14: NC2HW_F16
Node scale14_power2: NC2HW_F16
Node eltwise14: NC2HW_F16
Node conv15: NC2HW_F16
Node scale15_power1: NC2HW_F16
Node relu15: NC2HW_F16
Node scale15_power2: NC2HW_F16
Node eltwise15: NC2HW_F16
Node conv16: NC2HW_F16
Node scale16_power1: NC2HW_F16
Node relu16: NC2HW_F16
Node scale16_power2: NC2HW_F16
Node eltwise16: NC2HW_F16
Node conv17: NC2HW_F16
Node scale17_power1: NC2HW_F16
Node relu17: NC2HW_F16
Node scale17_power2: NC2HW_F16
Node eltwise17: NC2HW_F16
Node conv18: NC2HW_F16
Node scale18_power1: NC2HW_F16
Node relu18: NC2HW_F16
Node scale18_power2: NC2HW_F16
Node eltwise18: NC2HW_F16
Node conv19: NC2HW_F16
Node scale19_power1: NC2HW_F16
Node relu19: NC2HW_F16
Node scale19_power2: NC2HW_F16
Node eltwise19: NC2HW_F16
Node conv20: NC2HW_F16
Node scale20_power1: NC2HW_F16
Node relu20: NC2HW_F16
Node scale20_power2: NC2HW_F16
Node eltwise20: NC2HW_F16
Node layer13 copy: NC2HW_F16
Node reorg: NCHW_F32
Node reorg copy: NC2HW_F16
Node layer20 copy: NC2HW_F16
Node conv21: NC2HW_F16
Node scale21_power1: NC2HW_F16
Node relu21: NC2HW_F16
Node scale21_power2: NC2HW_F16
Node eltwise21: NC2HW_F16
Node conv22: NC2HW_F16
Node region: NCHW_F32
Adding reformat layer: conv1 reformatted input 0 (data) from NCHW_F32 to NC2HW_F16
Adding reformat layer: reorg reformatted input 0 (concat1) from NC2HW_F16 to NCHW_F32
Adding reformat layer: reorg reformatted output 0 (reorg) from NCHW_F32 to NC2HW_F16
Adding reformat layer: region reformatted input 0 (conv22) from NC2HW_F16 to NCHW_F32
After reformat layers: 120 layers
Block size 134217728
Block size 22151168
Block size 22151168
Block size 22151168
Block size 11075584
Block size 692224
Total Activation Memory: 212439040