BEVFusion for TensorRT 8.4

Nvidia-IOT GitHub page posted several lidar models built using TensorRT 8.5.

Do you have any version of BEVFusion using TensorRT 8.4?

Hi,

The GitHub contains the original ONNX model.
You can convert it to TensorRT 8.4 if you needed.

Do you meet any errors when doing so?

Thanks.

Hi, I tried to build fuser.plan and head.bbox.plan using tool/build_trt_engine.sh and it is successful on my Jetson Orin with TensorRT 8.4. However, the inference result is not correct, showing no object detected. I did the same process with a container on Orin running TensorRT 8.5, then the inference result is correct, showing 22 objects detected. During the debugging, I notice the inference results are different starting from the output of the fuser.plan.

First, I try to debug at the output of the fuser:

  1. Here is my debug result from Orin with TensorRT 8.4 using the input points.tensor, I set all the value for camera feature to be zero as I will not use camera input:
Loading point cloud from example-data/points.tensor
Loaded point cloud with 242180 points

Debug Point Cloud Data:
Total points: 242180
Features per point: 5

First 5 points (x, y, z, intensity, ...):
Point 0: -3.094 0.001 -1.835 4.000 0.000
Point 1: -3.260 0.002 -1.831 4.000 0.000
Point 2: -3.436 0.002 -1.827 5.000 0.000
Point 3: -3.633 0.004 -1.823 5.000 0.000
Point 4: -3.852 0.005 -1.823 7.000 0.000

Creating SCN processor...

Preparing point cloud data...
Running SCN forward pass...

Debug SCN Output:
Shape: [1, 128, 180, 180, 2]
Statistics:
- Total elements: 8294400
- Non-zero elements: 514679 (6.21%)
- Value range: [0.000, 9.938]
- Mean value: 0.025
First few non-zero values: 0.092 0.244 0.051 0.204 0.397 0.129 0.153 0.526 0.899 0.218

Sampling different locations:
Channel 0, Start (0,0): 0.000 0.000 0.000 0.000 0.000
Channel 64, Middle (90,90): 0.000 0.000 0.000 0.000 0.000

Creating zero-filled camera features...

Creating transfusion processor...
Running transfusion...

Debug Fusion Output:
Shape: [1, 256, 180, 180]

Global Statistics:
- Total elements: 8294400
- Non-zero elements: 1534493 (18.50%)
- Value range: [0.000, 1.031]
- Mean value: 0.014
- Standard deviation: 0.038

First few non-zero values: 0.093 0.225 0.260 0.230 0.240 0.245 0.250 0.250 0.250 0.250

Sampling different locations:
First channel (0,0):
[0,0]: 0.093 [0,1]: 0.225
[1,0]: 0.155 [1,1]: 0.114

Middle channel 128 at center:
0.000 0.000 0.000
0.000 0.000 0.000
0.000 0.000 0.000

Channel statistics (sampling every 32nd channel):
Channel   0: range=[0.000, 0.354], mean=0.082, nonzero=31569
Channel  32: range=[0.000, 0.224], mean=0.000, nonzero=118
Channel  64: range=[0.000, 0.128], mean=0.000, nonzero=6
Channel  96: range=[0.000, 0.351], mean=0.003, nonzero=3020
Channel 128: range=[0.000, 0.078], mean=0.000, nonzero=12
Channel 160: range=[0.000, 0.206], mean=0.107, nonzero=32352
Channel 192: range=[0.000, 0.151], mean=0.000, nonzero=13
Channel 224: range=[0.000, 0.145], mean=0.000, nonzero=1222
  1. Here is the debug result on Orin running TensorRT 8.5 running the same process:
Loading point cloud from example-data/points.tensor
Loaded point cloud with 242180 points

Debug Point Cloud Data:
Total points: 242180
Features per point: 5

First 5 points (x, y, z, intensity, ...):
Point 0: -3.094 0.001 -1.835 4.000 0.000
Point 1: -3.260 0.002 -1.831 4.000 0.000
Point 2: -3.436 0.002 -1.827 5.000 0.000
Point 3: -3.633 0.004 -1.823 5.000 0.000
Point 4: -3.852 0.005 -1.823 7.000 0.000

Creating SCN processor...

Preparing point cloud data...
Running SCN forward pass...

Debug SCN Output:
Shape: [1, 128, 180, 180, 2]
Statistics:
- Total elements: 8294400
- Non-zero elements: 514832 (6.21%)
- Value range: [0.000, 9.703]
- Mean value: 0.025
First few non-zero values: 0.092 0.244 0.051 0.204 0.397 0.129 0.153 0.526 0.899 0.218

Sampling different locations:
Channel 0, Start (0,0): 0.000 0.000 0.000 0.000 0.000
Channel 64, Middle (90,90): 0.000 0.000 0.000 0.000 0.000

Creating zero-filled camera features...

Creating transfusion processor...
Running transfusion...

Debug Fusion Output:
Shape: [1, 256, 180, 180]

Global Statistics:
- Total elements: 8294400
- Non-zero elements: 1514283 (18.26%)
- Value range: [0.000, 2.336]
- Mean value: 0.014
- Standard deviation: 0.042

First few non-zero values: 0.093 0.225 0.260 0.230 0.240 0.245 0.250 0.250 0.250 0.250

Sampling different locations:
First channel (0,0):
[0,0]: 0.093 [0,1]: 0.225
[1,0]: 0.155 [1,1]: 0.114

Middle channel 128 at center:
0.000 0.000 0.000
0.000 0.000 0.000
0.000 0.000 0.000

Channel statistics (sampling every 32nd channel):
Channel   0: range=[0.000, 0.692], mean=0.067, nonzero=25883
Channel  32: range=[0.000, 0.549], mean=0.005, nonzero=2438
Channel  64: range=[0.000, 0.916], mean=0.003, nonzero=1110
Channel  96: range=[0.000, 0.934], mean=0.001, nonzero=983
Channel 128: range=[0.000, 1.004], mean=0.005, nonzero=882
Channel 160: range=[0.000, 0.372], mean=0.081, nonzero=29439
Channel 192: range=[0.000, 0.412], mean=0.003, nonzero=1639
Channel 224: range=[0.000, 0.426], mean=0.004, nonzero=2211

I analyse a bit further by inspect the fuser.json genereted after building the fuser.plan from two versions.

  1. Here is model/resnet50int8/build/fuser.json from TensorRT 8.4:
{"Layers": [{
  "Name": "QuantizeLinear_3_clone_1",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "lidar",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "Outputs": [
  {
    "Name": "Concat_0_lidar_clone_1",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "ParameterType": "Reformat",
  "Origin": "QDQ",
  "TacticValue": "0x0000000000000000"
},{
  "Name": "QuantizeLinear_3_clone_0",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "camera",
    "Location": "Device",
    "Dimensions": [1,80,180,180],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "Outputs": [
  {
    "Name": "Concat_0_camera_clone_0",
    "Location": "Device",
    "Dimensions": [1,80,180,180],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "ParameterType": "Reformat",
  "Origin": "QDQ",
  "TacticValue": "0x0000000000000000"
},{
  "Name": "camera copy",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "Concat_0_camera_clone_0",
    "Location": "Device",
    "Dimensions": [1,80,180,180],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "Outputs": [
  {
    "Name": "513",
    "Location": "Device",
    "Dimensions": [1,80,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Reformat",
  "Origin": "CONCAT",
  "TacticValue": "0x00000000000003ea"
},{
  "Name": "lidar copy",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "Concat_0_lidar_clone_1",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "Outputs": [
  {
    "Name": "513",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Reformat",
  "Origin": "CONCAT",
  "TacticValue": "0x00000000000003ea"
},{
  "Name": "parent.fuser.0.weight + QuantizeLinear_8 + Conv_10 + Relu_11",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "513",
    "Location": "Device",
    "Dimensions": [1,336,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "526",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 774144},
  "Bias": {"Type": "Float", "Count": 256},
  "HasSparseWeights": 0,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.backbone.blocks.0.0.weight + QuantizeLinear_19 + Conv_21 + Relu_22",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "526",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "539",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 128,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 294912},
  "Bias": {"Type": "Float", "Count": 128},
  "HasSparseWeights": 0,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.backbone.blocks.0.3.weight + QuantizeLinear_30 + Conv_32 + Relu_33",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "539",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "552",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 128,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 147456},
  "Bias": {"Type": "Float", "Count": 128},
  "HasSparseWeights": 0,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.backbone.blocks.0.6.weight + QuantizeLinear_41 + Conv_43 + Relu_44",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "552",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "565",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 128,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 147456},
  "Bias": {"Type": "Float", "Count": 128},
  "HasSparseWeights": 0,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.backbone.blocks.0.9.weight + QuantizeLinear_52 + Conv_54 + Relu_55",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "565",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "578",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 128,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 147456},
  "Bias": {"Type": "Float", "Count": 128},
  "HasSparseWeights": 0,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.backbone.blocks.0.12.weight + QuantizeLinear_63 + Conv_65 + Relu_66",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "578",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "591",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 128,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 147456},
  "Bias": {"Type": "Float", "Count": 128},
  "HasSparseWeights": 0,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.backbone.blocks.0.15.weight + QuantizeLinear_74 + Conv_76 + Relu_77",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "591",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "601",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major FP16 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 128,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 147456},
  "Bias": {"Type": "Float", "Count": 128},
  "HasSparseWeights": 0,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8f16_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xfa6c413ca4875a2e"
},{
  "Name": "QuantizeLinear_80",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "601",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major FP16 format"
  }],
  "Outputs": [
  {
    "Name": "604",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Reformat",
  "Origin": "QDQ",
  "TacticValue": "0x00000000000003ea"
},{
  "Name": "Reformatting CopyNode for Input Tensor 0 to Conv_144 + Relu_145",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "601",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major FP16 format"
  }],
  "Outputs": [
  {
    "Name": "Reformatted Input Tensor 0 to Conv_144 + Relu_145",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
  }],
  "ParameterType": "Reformat",
  "Origin": "REFORMAT",
  "TacticValue": "0x00000000000003ea"
},{
  "Name": "Conv_144 + Relu_145",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "Reformatted Input Tensor 0 to Conv_144 + Relu_145",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
  }],
  "Outputs": [
  {
    "Name": "Reformatted Output Tensor 0 to Conv_144 + Relu_145",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
  }],
  "ParameterType": "Convolution",
  "Kernel": [1,1],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [0,0],
  "PostPadding": [0,0],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Float", "Count": 32768},
  "Bias": {"Type": "Float", "Count": 256},
  "HasSparseWeights": 0,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r1s1",
  "TacticValue": "0x130df49cb195156b"
},{
  "Name": "Reformatting CopyNode for Output Tensor 0 to Conv_144 + Relu_145",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "Reformatted Output Tensor 0 to Conv_144 + Relu_145",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
  }],
  "Outputs": [
  {
    "Name": "middle",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "ParameterType": "Reformat",
  "Origin": "REFORMAT",
  "TacticValue": "0x00000000000003ea"
},{
  "Name": "parent.decoder.backbone.blocks.1.0.weight + QuantizeLinear_85 + Conv_87 + Relu_88",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "604",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "617",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [2,2],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 294912},
  "Bias": {"Type": "Float", "Count": 256},
  "HasSparseWeights": 0,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.backbone.blocks.1.3.weight + QuantizeLinear_96 + Conv_98 + Relu_99",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "617",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "630",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 589824},
  "Bias": {"Type": "Float", "Count": 256},
  "HasSparseWeights": 0,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32",
  "TacticValue": "0x8d50646eff0cde6d"
},{
  "Name": "parent.decoder.backbone.blocks.1.6.weight + QuantizeLinear_107 + Conv_109 + Relu_110",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "630",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "643",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 589824},
  "Bias": {"Type": "Float", "Count": 256},
  "HasSparseWeights": 0,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32",
  "TacticValue": "0x8d50646eff0cde6d"
},{
  "Name": "parent.decoder.backbone.blocks.1.9.weight + QuantizeLinear_118 + Conv_120 + Relu_121",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "643",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "656",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 589824},
  "Bias": {"Type": "Float", "Count": 256},
  "HasSparseWeights": 0,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32",
  "TacticValue": "0x8d50646eff0cde6d"
},{
  "Name": "parent.decoder.backbone.blocks.1.12.weight + QuantizeLinear_129 + Conv_131 + Relu_132",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "656",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "669",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 589824},
  "Bias": {"Type": "Float", "Count": 256},
  "HasSparseWeights": 0,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32",
  "TacticValue": "0x8d50646eff0cde6d"
},{
  "Name": "parent.decoder.backbone.blocks.1.15.weight + QuantizeLinear_140 + Conv_142 + Relu_143",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "669",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "684",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 589824},
  "Bias": {"Type": "Float", "Count": 256},
  "HasSparseWeights": 0,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32",
  "TacticValue": "0x8d50646eff0cde6d"
},{
  "Name": "parent.decoder.neck.deblocks.1.0.weight + QuantizeLinear_153 + ConvTranspose_155",
  "LayerType": "CaskDeconvolution",
  "Inputs": [
  {
    "Name": "684",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "693",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP32"
  }],
  "ParameterType": "Convolution",
  "Kernel": [2,2],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [0,0],
  "PostPadding": [0,0],
  "Stride": [2,2],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 262144},
  "Bias": {"Type": "Float", "Count": 0},
  "HasSparseWeights": 0,
  "Activation": "NONE",
  "HasBias": 0,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8f32_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_simple_t1r1s1",
  "TacticValue": "0xbeb5d91e1874a437"
},{
  "Name": "BatchNormalization_156 + Relu_157",
  "LayerType": "Scale",
  "Inputs": [
  {
    "Name": "693",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP32"
  }],
  "Outputs": [
  {
    "Name": "Reformatted Output Tensor 0 to BatchNormalization_156 + Relu_157",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP32"
  }],
  "ParameterType": "Scale",
  "Mode": "CHANNEL",
  "Shift": {"Type": "Float", "Count": 256},
  "Scale": {"Type": "Float", "Count": 256},
  "Power": {"Type": "Float", "Count": 0},
  "Activation": "RELU",
  "ChannelAxis": 1,
  "TacticValue": "0x0000000000000000"
},{
  "Name": "Reformatting CopyNode for Output Tensor 0 to BatchNormalization_156 + Relu_157",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "Reformatted Output Tensor 0 to BatchNormalization_156 + Relu_157",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP32"
  }],
  "Outputs": [
  {
    "Name": "middle",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "ParameterType": "Reformat",
  "Origin": "REFORMAT",
  "TacticValue": "0x00000000000003e8"
}],
"Bindings": ["camera"
,"lidar"
,"middle"
]}

  1. Here is the model/resnet50int8/build/fuser.json from TensorRT 8.5:
{"Layers": [{
  "Name": "QuantizeLinear_3_clone_1",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "lidar",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "Outputs": [
  {
    "Name": "513",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Reformat",
  "Origin": "QDQ",
  "TacticValue": "0x00000000000003ea"
},{
  "Name": "QuantizeLinear_3_clone_0",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "camera",
    "Location": "Device",
    "Dimensions": [1,80,180,180],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "Outputs": [
  {
    "Name": "Concat_0_camera_clone_0",
    "Location": "Device",
    "Dimensions": [1,80,180,180],
    "Format/Datatype": "Row major Int8 format"
  }],
  "ParameterType": "Reformat",
  "Origin": "QDQ",
  "TacticValue": "0x0000000000000000"
},{
  "Name": "Concat_0_camera_clone_0 copy",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "Concat_0_camera_clone_0",
    "Location": "Device",
    "Dimensions": [1,80,180,180],
    "Format/Datatype": "Row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "513",
    "Location": "Device",
    "Dimensions": [1,80,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Reformat",
  "Origin": "CONCAT",
  "TacticValue": "0x00000000000003ea"
},{
  "Name": "parent.fuser.0.weight + QuantizeLinear_8 + Conv_10 + Relu_11",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "513",
    "Location": "Device",
    "Dimensions": [1,336,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "526",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 774144},
  "Bias": {"Type": "Float", "Count": 256},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.backbone.blocks.0.0.weight + QuantizeLinear_19 + Conv_21 + Relu_22",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "526",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "539",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 128,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 294912},
  "Bias": {"Type": "Float", "Count": 128},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.backbone.blocks.0.3.weight + QuantizeLinear_30 + Conv_32 + Relu_33",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "539",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "552",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 128,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 147456},
  "Bias": {"Type": "Float", "Count": 128},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.backbone.blocks.0.6.weight + QuantizeLinear_41 + Conv_43 + Relu_44",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "552",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "565",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 128,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 147456},
  "Bias": {"Type": "Float", "Count": 128},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.backbone.blocks.0.9.weight + QuantizeLinear_52 + Conv_54 + Relu_55",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "565",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "578",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 128,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 147456},
  "Bias": {"Type": "Float", "Count": 128},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.backbone.blocks.0.12.weight + QuantizeLinear_63 + Conv_65 + Relu_66",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "578",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "591",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 128,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 147456},
  "Bias": {"Type": "Float", "Count": 128},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.backbone.blocks.0.15.weight + QuantizeLinear_74 + Conv_76 + Relu_77",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "591",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "601",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major FP16 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 128,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 147456},
  "Bias": {"Type": "Float", "Count": 128},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8f16_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xfa6c413ca4875a2e"
},{
  "Name": "QuantizeLinear_80",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "601",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major FP16 format"
  }],
  "Outputs": [
  {
    "Name": "604",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Reformat",
  "Origin": "QDQ",
  "TacticValue": "0x00000000000003ea"
},{
  "Name": "Reformatting CopyNode for Input Tensor 0 to Conv_144 + Relu_145",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "601",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major FP16 format"
  }],
  "Outputs": [
  {
    "Name": "Reformatted Input Tensor 0 to Conv_144 + Relu_145",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
  }],
  "ParameterType": "Reformat",
  "Origin": "REFORMAT",
  "TacticValue": "0x00000000000003ea"
},{
  "Name": "Conv_144 + Relu_145",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "Reformatted Input Tensor 0 to Conv_144 + Relu_145",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
  }],
  "Outputs": [
  {
    "Name": "Reformatted Output Tensor 0 to Conv_144 + Relu_145",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
  }],
  "ParameterType": "Convolution",
  "Kernel": [1,1],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [0,0],
  "PostPadding": [0,0],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Float", "Count": 32768},
  "Bias": {"Type": "Float", "Count": 256},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_simple_t1r1s1",
  "TacticValue": "0x9dece0dc37e90462"
},{
  "Name": "Reformatting CopyNode for Output Tensor 0 to Conv_144 + Relu_145",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "Reformatted Output Tensor 0 to Conv_144 + Relu_145",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
  }],
  "Outputs": [
  {
    "Name": "middle",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "ParameterType": "Reformat",
  "Origin": "REFORMAT",
  "TacticValue": "0x00000000000003ea"
},{
  "Name": "parent.decoder.backbone.blocks.1.0.weight + QuantizeLinear_85 + Conv_87 + Relu_88",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "604",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "617",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [2,2],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 294912},
  "Bias": {"Type": "Float", "Count": 256},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.backbone.blocks.1.3.weight + QuantizeLinear_96 + Conv_98 + Relu_99",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "617",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "630",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 589824},
  "Bias": {"Type": "Float", "Count": 256},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.backbone.blocks.1.6.weight + QuantizeLinear_107 + Conv_109 + Relu_110",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "630",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "643",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 589824},
  "Bias": {"Type": "Float", "Count": 256},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.backbone.blocks.1.9.weight + QuantizeLinear_118 + Conv_120 + Relu_121",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "643",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "656",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 589824},
  "Bias": {"Type": "Float", "Count": 256},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.backbone.blocks.1.12.weight + QuantizeLinear_129 + Conv_131 + Relu_132",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "656",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "669",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 589824},
  "Bias": {"Type": "Float", "Count": 256},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.backbone.blocks.1.15.weight + QuantizeLinear_140 + Conv_142 + Relu_143",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "669",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "684",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 589824},
  "Bias": {"Type": "Float", "Count": 256},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
  "TacticValue": "0xea88b51105501f96"
},{
  "Name": "parent.decoder.neck.deblocks.1.0.weight + QuantizeLinear_153 + ConvTranspose_155",
  "LayerType": "CaskDeconvolution",
  "Inputs": [
  {
    "Name": "684",
    "Location": "Device",
    "Dimensions": [1,256,90,90],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "Outputs": [
  {
    "Name": "693",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP32"
  }],
  "ParameterType": "Convolution",
  "Kernel": [2,2],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [0,0],
  "PostPadding": [0,0],
  "Stride": [2,2],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Int8", "Count": 262144},
  "Bias": {"Type": "Float", "Count": 0},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "NONE",
  "HasBias": 0,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8f32_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize256x64x64_stage4_warpsize4x1x1_g1_tensor16x8x32_simple_t1r1s1",
  "TacticValue": "0x1e55f8b415964e81"
},{
  "Name": "BatchNormalization_156 + Relu_157",
  "LayerType": "Scale",
  "Inputs": [
  {
    "Name": "693",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP32"
  }],
  "Outputs": [
  {
    "Name": "Reformatted Output Tensor 0 to BatchNormalization_156 + Relu_157",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP32"
  }],
  "ParameterType": "Scale",
  "Mode": "CHANNEL",
  "Shift": {"Type": "Float", "Count": 256},
  "Scale": {"Type": "Float", "Count": 256},
  "Power": {"Type": "Float", "Count": 0},
  "Activation": "RELU",
  "ChannelAxis": 1,
  "TacticValue": "0x0000000000000000"
},{
  "Name": "Reformatting CopyNode for Output Tensor 0 to BatchNormalization_156 + Relu_157",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "Reformatted Output Tensor 0 to BatchNormalization_156 + Relu_157",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP32"
  }],
  "Outputs": [
  {
    "Name": "middle",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "ParameterType": "Reformat",
  "Origin": "REFORMAT",
  "TacticValue": "0x00000000000003e8"
}],
"Bindings": ["camera"
,"lidar"
,"middle"
]}

I notice there is a difference at this layer in these json files:

  1. For TensorRT 8.4 version:
{
  "Name": "Reformatting CopyNode for Input Tensor 0 to Conv_144 + Relu_145",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "601",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major FP16 format"
  }],
  "Outputs": [
  {
    "Name": "Reformatted Input Tensor 0 to Conv_144 + Relu_145",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
  }],
  "ParameterType": "Reformat",
  "Origin": "REFORMAT",
  "TacticValue": "0x00000000000003ea"
},{
  "Name": "Conv_144 + Relu_145",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "Reformatted Input Tensor 0 to Conv_144 + Relu_145",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
  }],
  "Outputs": [
  {
    "Name": "Reformatted Output Tensor 0 to Conv_144 + Relu_145",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
  }],
  "ParameterType": "Convolution",
  "Kernel": [1,1],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [0,0],
  "PostPadding": [0,0],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Float", "Count": 32768},
  "Bias": {"Type": "Float", "Count": 256},
  "HasSparseWeights": 0,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r1s1",
  "TacticValue": "0x130df49cb195156b"
}
  1. For TensorRT 8.5 version:
{
  "Name": "Reformatting CopyNode for Input Tensor 0 to Conv_144 + Relu_145",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "601",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major FP16 format"
  }],
  "Outputs": [
  {
    "Name": "Reformatted Input Tensor 0 to Conv_144 + Relu_145",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
  }],
  "ParameterType": "Reformat",
  "Origin": "REFORMAT",
  "TacticValue": "0x00000000000003ea"
},{
  "Name": "Conv_144 + Relu_145",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "Reformatted Input Tensor 0 to Conv_144 + Relu_145",
    "Location": "Device",
    "Dimensions": [1,128,180,180],
    "Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
  }],
  "Outputs": [
  {
    "Name": "Reformatted Output Tensor 0 to Conv_144 + Relu_145",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
  }],
  "ParameterType": "Convolution",
  "Kernel": [1,1],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [0,0],
  "PostPadding": [0,0],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 256,
  "Groups": 1,
  "Weights": {"Type": "Float", "Count": 32768},
  "Bias": {"Type": "Float", "Count": 256},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_simple_t1r1s1",
  "TacticValue": "0x9dece0dc37e90462"
}

There difference is in the Tactic:

TensorRT8.4: "sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r1s1"
TensorRT8.5: "sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_simple_t1r1s1"
TensorRT8.5 also use additional parameters like   "HasDynamicFilter", "HasDynamicBias", "HasResidual",

Please help me to understand if this “simple” in tactic implementation makes the inference result for version 8.5 correct and what could be done to make version 8.4 get the same output inference like version 8.5. Thank you very much.

I figure out that the first layer conversion does not work on TensorRT 8.4, after the conversion, it is still FP16:

{"Layers": [{
  "Name": "QuantizeLinear_3_clone_1",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "lidar",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "Outputs": [
  {
    "Name": "Concat_0_lidar_clone_1",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "ParameterType": "Reformat",
  "Origin": "QDQ",
  "TacticValue": "0x0000000000000000"
}

While it works in TensorRT 8.5:

{"Layers": [{
  "Name": "QuantizeLinear_3_clone_1",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "lidar",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Row major linear FP16 format"
  }],
  "Outputs": [
  {
    "Name": "513",
    "Location": "Device",
    "Dimensions": [1,256,180,180],
    "Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
  }],
  "ParameterType": "Reformat",
  "Origin": "QDQ",
  "TacticValue": "0x00000000000003ea"
}

It seems to be a problem of QDQ to convert from fp16 to int8 in TensorRT 8.4 version for this model
so I decide to convert the fp16 model instead and now the output of the model is correct for both 8.4 and 8.5

Hi,

Thanks for the update.
It looks like you can get it to work after switching to fp16 mode.
Is that correct?

Thanks.

Yes, the fp16 model can work with TensorRT 8.4. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.