Nvidia-IOT GitHub page posted several lidar models built using TensorRT 8.5.
Do you have any version of BEVFusion using TensorRT 8.4?
Nvidia-IOT GitHub page posted several lidar models built using TensorRT 8.5.
Do you have any version of BEVFusion using TensorRT 8.4?
Hi,
The GitHub contains the original ONNX model.
You can convert it to TensorRT 8.4 if you needed.
Do you meet any errors when doing so?
Thanks.
Hi, I tried to build fuser.plan and head.bbox.plan using tool/build_trt_engine.sh and it is successful on my Jetson Orin with TensorRT 8.4. However, the inference result is not correct, showing no object detected. I did the same process with a container on Orin running TensorRT 8.5, then the inference result is correct, showing 22 objects detected. During the debugging, I notice the inference results are different starting from the output of the fuser.plan.
First, I try to debug at the output of the fuser:
Loading point cloud from example-data/points.tensor
Loaded point cloud with 242180 points
Debug Point Cloud Data:
Total points: 242180
Features per point: 5
First 5 points (x, y, z, intensity, ...):
Point 0: -3.094 0.001 -1.835 4.000 0.000
Point 1: -3.260 0.002 -1.831 4.000 0.000
Point 2: -3.436 0.002 -1.827 5.000 0.000
Point 3: -3.633 0.004 -1.823 5.000 0.000
Point 4: -3.852 0.005 -1.823 7.000 0.000
Creating SCN processor...
Preparing point cloud data...
Running SCN forward pass...
Debug SCN Output:
Shape: [1, 128, 180, 180, 2]
Statistics:
- Total elements: 8294400
- Non-zero elements: 514679 (6.21%)
- Value range: [0.000, 9.938]
- Mean value: 0.025
First few non-zero values: 0.092 0.244 0.051 0.204 0.397 0.129 0.153 0.526 0.899 0.218
Sampling different locations:
Channel 0, Start (0,0): 0.000 0.000 0.000 0.000 0.000
Channel 64, Middle (90,90): 0.000 0.000 0.000 0.000 0.000
Creating zero-filled camera features...
Creating transfusion processor...
Running transfusion...
Debug Fusion Output:
Shape: [1, 256, 180, 180]
Global Statistics:
- Total elements: 8294400
- Non-zero elements: 1534493 (18.50%)
- Value range: [0.000, 1.031]
- Mean value: 0.014
- Standard deviation: 0.038
First few non-zero values: 0.093 0.225 0.260 0.230 0.240 0.245 0.250 0.250 0.250 0.250
Sampling different locations:
First channel (0,0):
[0,0]: 0.093 [0,1]: 0.225
[1,0]: 0.155 [1,1]: 0.114
Middle channel 128 at center:
0.000 0.000 0.000
0.000 0.000 0.000
0.000 0.000 0.000
Channel statistics (sampling every 32nd channel):
Channel 0: range=[0.000, 0.354], mean=0.082, nonzero=31569
Channel 32: range=[0.000, 0.224], mean=0.000, nonzero=118
Channel 64: range=[0.000, 0.128], mean=0.000, nonzero=6
Channel 96: range=[0.000, 0.351], mean=0.003, nonzero=3020
Channel 128: range=[0.000, 0.078], mean=0.000, nonzero=12
Channel 160: range=[0.000, 0.206], mean=0.107, nonzero=32352
Channel 192: range=[0.000, 0.151], mean=0.000, nonzero=13
Channel 224: range=[0.000, 0.145], mean=0.000, nonzero=1222
Loading point cloud from example-data/points.tensor
Loaded point cloud with 242180 points
Debug Point Cloud Data:
Total points: 242180
Features per point: 5
First 5 points (x, y, z, intensity, ...):
Point 0: -3.094 0.001 -1.835 4.000 0.000
Point 1: -3.260 0.002 -1.831 4.000 0.000
Point 2: -3.436 0.002 -1.827 5.000 0.000
Point 3: -3.633 0.004 -1.823 5.000 0.000
Point 4: -3.852 0.005 -1.823 7.000 0.000
Creating SCN processor...
Preparing point cloud data...
Running SCN forward pass...
Debug SCN Output:
Shape: [1, 128, 180, 180, 2]
Statistics:
- Total elements: 8294400
- Non-zero elements: 514832 (6.21%)
- Value range: [0.000, 9.703]
- Mean value: 0.025
First few non-zero values: 0.092 0.244 0.051 0.204 0.397 0.129 0.153 0.526 0.899 0.218
Sampling different locations:
Channel 0, Start (0,0): 0.000 0.000 0.000 0.000 0.000
Channel 64, Middle (90,90): 0.000 0.000 0.000 0.000 0.000
Creating zero-filled camera features...
Creating transfusion processor...
Running transfusion...
Debug Fusion Output:
Shape: [1, 256, 180, 180]
Global Statistics:
- Total elements: 8294400
- Non-zero elements: 1514283 (18.26%)
- Value range: [0.000, 2.336]
- Mean value: 0.014
- Standard deviation: 0.042
First few non-zero values: 0.093 0.225 0.260 0.230 0.240 0.245 0.250 0.250 0.250 0.250
Sampling different locations:
First channel (0,0):
[0,0]: 0.093 [0,1]: 0.225
[1,0]: 0.155 [1,1]: 0.114
Middle channel 128 at center:
0.000 0.000 0.000
0.000 0.000 0.000
0.000 0.000 0.000
Channel statistics (sampling every 32nd channel):
Channel 0: range=[0.000, 0.692], mean=0.067, nonzero=25883
Channel 32: range=[0.000, 0.549], mean=0.005, nonzero=2438
Channel 64: range=[0.000, 0.916], mean=0.003, nonzero=1110
Channel 96: range=[0.000, 0.934], mean=0.001, nonzero=983
Channel 128: range=[0.000, 1.004], mean=0.005, nonzero=882
Channel 160: range=[0.000, 0.372], mean=0.081, nonzero=29439
Channel 192: range=[0.000, 0.412], mean=0.003, nonzero=1639
Channel 224: range=[0.000, 0.426], mean=0.004, nonzero=2211
I analyse a bit further by inspect the fuser.json genereted after building the fuser.plan from two versions.
{"Layers": [{
"Name": "QuantizeLinear_3_clone_1",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "lidar",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP16 format"
}],
"Outputs": [
{
"Name": "Concat_0_lidar_clone_1",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP16 format"
}],
"ParameterType": "Reformat",
"Origin": "QDQ",
"TacticValue": "0x0000000000000000"
},{
"Name": "QuantizeLinear_3_clone_0",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "camera",
"Location": "Device",
"Dimensions": [1,80,180,180],
"Format/Datatype": "Row major linear FP16 format"
}],
"Outputs": [
{
"Name": "Concat_0_camera_clone_0",
"Location": "Device",
"Dimensions": [1,80,180,180],
"Format/Datatype": "Row major linear FP16 format"
}],
"ParameterType": "Reformat",
"Origin": "QDQ",
"TacticValue": "0x0000000000000000"
},{
"Name": "camera copy",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "Concat_0_camera_clone_0",
"Location": "Device",
"Dimensions": [1,80,180,180],
"Format/Datatype": "Row major linear FP16 format"
}],
"Outputs": [
{
"Name": "513",
"Location": "Device",
"Dimensions": [1,80,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Reformat",
"Origin": "CONCAT",
"TacticValue": "0x00000000000003ea"
},{
"Name": "lidar copy",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "Concat_0_lidar_clone_1",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP16 format"
}],
"Outputs": [
{
"Name": "513",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Reformat",
"Origin": "CONCAT",
"TacticValue": "0x00000000000003ea"
},{
"Name": "parent.fuser.0.weight + QuantizeLinear_8 + Conv_10 + Relu_11",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "513",
"Location": "Device",
"Dimensions": [1,336,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "526",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 774144},
"Bias": {"Type": "Float", "Count": 256},
"HasSparseWeights": 0,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.backbone.blocks.0.0.weight + QuantizeLinear_19 + Conv_21 + Relu_22",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "526",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "539",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 128,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 294912},
"Bias": {"Type": "Float", "Count": 128},
"HasSparseWeights": 0,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.backbone.blocks.0.3.weight + QuantizeLinear_30 + Conv_32 + Relu_33",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "539",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "552",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 128,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 147456},
"Bias": {"Type": "Float", "Count": 128},
"HasSparseWeights": 0,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.backbone.blocks.0.6.weight + QuantizeLinear_41 + Conv_43 + Relu_44",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "552",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "565",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 128,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 147456},
"Bias": {"Type": "Float", "Count": 128},
"HasSparseWeights": 0,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.backbone.blocks.0.9.weight + QuantizeLinear_52 + Conv_54 + Relu_55",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "565",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "578",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 128,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 147456},
"Bias": {"Type": "Float", "Count": 128},
"HasSparseWeights": 0,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.backbone.blocks.0.12.weight + QuantizeLinear_63 + Conv_65 + Relu_66",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "578",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "591",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 128,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 147456},
"Bias": {"Type": "Float", "Count": 128},
"HasSparseWeights": 0,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.backbone.blocks.0.15.weight + QuantizeLinear_74 + Conv_76 + Relu_77",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "591",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "601",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major FP16 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 128,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 147456},
"Bias": {"Type": "Float", "Count": 128},
"HasSparseWeights": 0,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8f16_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xfa6c413ca4875a2e"
},{
"Name": "QuantizeLinear_80",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "601",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major FP16 format"
}],
"Outputs": [
{
"Name": "604",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Reformat",
"Origin": "QDQ",
"TacticValue": "0x00000000000003ea"
},{
"Name": "Reformatting CopyNode for Input Tensor 0 to Conv_144 + Relu_145",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "601",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major FP16 format"
}],
"Outputs": [
{
"Name": "Reformatted Input Tensor 0 to Conv_144 + Relu_145",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
}],
"ParameterType": "Reformat",
"Origin": "REFORMAT",
"TacticValue": "0x00000000000003ea"
},{
"Name": "Conv_144 + Relu_145",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "Reformatted Input Tensor 0 to Conv_144 + Relu_145",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
}],
"Outputs": [
{
"Name": "Reformatted Output Tensor 0 to Conv_144 + Relu_145",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
}],
"ParameterType": "Convolution",
"Kernel": [1,1],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [0,0],
"PostPadding": [0,0],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Float", "Count": 32768},
"Bias": {"Type": "Float", "Count": 256},
"HasSparseWeights": 0,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r1s1",
"TacticValue": "0x130df49cb195156b"
},{
"Name": "Reformatting CopyNode for Output Tensor 0 to Conv_144 + Relu_145",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "Reformatted Output Tensor 0 to Conv_144 + Relu_145",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
}],
"Outputs": [
{
"Name": "middle",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP16 format"
}],
"ParameterType": "Reformat",
"Origin": "REFORMAT",
"TacticValue": "0x00000000000003ea"
},{
"Name": "parent.decoder.backbone.blocks.1.0.weight + QuantizeLinear_85 + Conv_87 + Relu_88",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "604",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "617",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [2,2],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 294912},
"Bias": {"Type": "Float", "Count": 256},
"HasSparseWeights": 0,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.backbone.blocks.1.3.weight + QuantizeLinear_96 + Conv_98 + Relu_99",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "617",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "630",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 589824},
"Bias": {"Type": "Float", "Count": 256},
"HasSparseWeights": 0,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32",
"TacticValue": "0x8d50646eff0cde6d"
},{
"Name": "parent.decoder.backbone.blocks.1.6.weight + QuantizeLinear_107 + Conv_109 + Relu_110",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "630",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "643",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 589824},
"Bias": {"Type": "Float", "Count": 256},
"HasSparseWeights": 0,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32",
"TacticValue": "0x8d50646eff0cde6d"
},{
"Name": "parent.decoder.backbone.blocks.1.9.weight + QuantizeLinear_118 + Conv_120 + Relu_121",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "643",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "656",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 589824},
"Bias": {"Type": "Float", "Count": 256},
"HasSparseWeights": 0,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32",
"TacticValue": "0x8d50646eff0cde6d"
},{
"Name": "parent.decoder.backbone.blocks.1.12.weight + QuantizeLinear_129 + Conv_131 + Relu_132",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "656",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "669",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 589824},
"Bias": {"Type": "Float", "Count": 256},
"HasSparseWeights": 0,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32",
"TacticValue": "0x8d50646eff0cde6d"
},{
"Name": "parent.decoder.backbone.blocks.1.15.weight + QuantizeLinear_140 + Conv_142 + Relu_143",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "669",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "684",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 589824},
"Bias": {"Type": "Float", "Count": 256},
"HasSparseWeights": 0,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32",
"TacticValue": "0x8d50646eff0cde6d"
},{
"Name": "parent.decoder.neck.deblocks.1.0.weight + QuantizeLinear_153 + ConvTranspose_155",
"LayerType": "CaskDeconvolution",
"Inputs": [
{
"Name": "684",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "693",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP32"
}],
"ParameterType": "Convolution",
"Kernel": [2,2],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [0,0],
"PostPadding": [0,0],
"Stride": [2,2],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 262144},
"Bias": {"Type": "Float", "Count": 0},
"HasSparseWeights": 0,
"Activation": "NONE",
"HasBias": 0,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8f32_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_simple_t1r1s1",
"TacticValue": "0xbeb5d91e1874a437"
},{
"Name": "BatchNormalization_156 + Relu_157",
"LayerType": "Scale",
"Inputs": [
{
"Name": "693",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP32"
}],
"Outputs": [
{
"Name": "Reformatted Output Tensor 0 to BatchNormalization_156 + Relu_157",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP32"
}],
"ParameterType": "Scale",
"Mode": "CHANNEL",
"Shift": {"Type": "Float", "Count": 256},
"Scale": {"Type": "Float", "Count": 256},
"Power": {"Type": "Float", "Count": 0},
"Activation": "RELU",
"ChannelAxis": 1,
"TacticValue": "0x0000000000000000"
},{
"Name": "Reformatting CopyNode for Output Tensor 0 to BatchNormalization_156 + Relu_157",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "Reformatted Output Tensor 0 to BatchNormalization_156 + Relu_157",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP32"
}],
"Outputs": [
{
"Name": "middle",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP16 format"
}],
"ParameterType": "Reformat",
"Origin": "REFORMAT",
"TacticValue": "0x00000000000003e8"
}],
"Bindings": ["camera"
,"lidar"
,"middle"
]}
{"Layers": [{
"Name": "QuantizeLinear_3_clone_1",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "lidar",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP16 format"
}],
"Outputs": [
{
"Name": "513",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Reformat",
"Origin": "QDQ",
"TacticValue": "0x00000000000003ea"
},{
"Name": "QuantizeLinear_3_clone_0",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "camera",
"Location": "Device",
"Dimensions": [1,80,180,180],
"Format/Datatype": "Row major linear FP16 format"
}],
"Outputs": [
{
"Name": "Concat_0_camera_clone_0",
"Location": "Device",
"Dimensions": [1,80,180,180],
"Format/Datatype": "Row major Int8 format"
}],
"ParameterType": "Reformat",
"Origin": "QDQ",
"TacticValue": "0x0000000000000000"
},{
"Name": "Concat_0_camera_clone_0 copy",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "Concat_0_camera_clone_0",
"Location": "Device",
"Dimensions": [1,80,180,180],
"Format/Datatype": "Row major Int8 format"
}],
"Outputs": [
{
"Name": "513",
"Location": "Device",
"Dimensions": [1,80,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Reformat",
"Origin": "CONCAT",
"TacticValue": "0x00000000000003ea"
},{
"Name": "parent.fuser.0.weight + QuantizeLinear_8 + Conv_10 + Relu_11",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "513",
"Location": "Device",
"Dimensions": [1,336,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "526",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 774144},
"Bias": {"Type": "Float", "Count": 256},
"HasSparseWeights": 0,
"HasDynamicFilter": 0,
"HasDynamicBias": 0,
"HasResidual": 0,
"ConvXAsActInputIdx": -1,
"BiasAsActInputIdx": -1,
"ResAsActInputIdx": -1,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.backbone.blocks.0.0.weight + QuantizeLinear_19 + Conv_21 + Relu_22",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "526",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "539",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 128,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 294912},
"Bias": {"Type": "Float", "Count": 128},
"HasSparseWeights": 0,
"HasDynamicFilter": 0,
"HasDynamicBias": 0,
"HasResidual": 0,
"ConvXAsActInputIdx": -1,
"BiasAsActInputIdx": -1,
"ResAsActInputIdx": -1,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.backbone.blocks.0.3.weight + QuantizeLinear_30 + Conv_32 + Relu_33",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "539",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "552",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 128,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 147456},
"Bias": {"Type": "Float", "Count": 128},
"HasSparseWeights": 0,
"HasDynamicFilter": 0,
"HasDynamicBias": 0,
"HasResidual": 0,
"ConvXAsActInputIdx": -1,
"BiasAsActInputIdx": -1,
"ResAsActInputIdx": -1,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.backbone.blocks.0.6.weight + QuantizeLinear_41 + Conv_43 + Relu_44",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "552",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "565",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 128,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 147456},
"Bias": {"Type": "Float", "Count": 128},
"HasSparseWeights": 0,
"HasDynamicFilter": 0,
"HasDynamicBias": 0,
"HasResidual": 0,
"ConvXAsActInputIdx": -1,
"BiasAsActInputIdx": -1,
"ResAsActInputIdx": -1,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.backbone.blocks.0.9.weight + QuantizeLinear_52 + Conv_54 + Relu_55",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "565",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "578",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 128,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 147456},
"Bias": {"Type": "Float", "Count": 128},
"HasSparseWeights": 0,
"HasDynamicFilter": 0,
"HasDynamicBias": 0,
"HasResidual": 0,
"ConvXAsActInputIdx": -1,
"BiasAsActInputIdx": -1,
"ResAsActInputIdx": -1,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.backbone.blocks.0.12.weight + QuantizeLinear_63 + Conv_65 + Relu_66",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "578",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "591",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 128,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 147456},
"Bias": {"Type": "Float", "Count": 128},
"HasSparseWeights": 0,
"HasDynamicFilter": 0,
"HasDynamicBias": 0,
"HasResidual": 0,
"ConvXAsActInputIdx": -1,
"BiasAsActInputIdx": -1,
"ResAsActInputIdx": -1,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.backbone.blocks.0.15.weight + QuantizeLinear_74 + Conv_76 + Relu_77",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "591",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "601",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major FP16 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 128,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 147456},
"Bias": {"Type": "Float", "Count": 128},
"HasSparseWeights": 0,
"HasDynamicFilter": 0,
"HasDynamicBias": 0,
"HasResidual": 0,
"ConvXAsActInputIdx": -1,
"BiasAsActInputIdx": -1,
"ResAsActInputIdx": -1,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8f16_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xfa6c413ca4875a2e"
},{
"Name": "QuantizeLinear_80",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "601",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major FP16 format"
}],
"Outputs": [
{
"Name": "604",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Reformat",
"Origin": "QDQ",
"TacticValue": "0x00000000000003ea"
},{
"Name": "Reformatting CopyNode for Input Tensor 0 to Conv_144 + Relu_145",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "601",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major FP16 format"
}],
"Outputs": [
{
"Name": "Reformatted Input Tensor 0 to Conv_144 + Relu_145",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
}],
"ParameterType": "Reformat",
"Origin": "REFORMAT",
"TacticValue": "0x00000000000003ea"
},{
"Name": "Conv_144 + Relu_145",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "Reformatted Input Tensor 0 to Conv_144 + Relu_145",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
}],
"Outputs": [
{
"Name": "Reformatted Output Tensor 0 to Conv_144 + Relu_145",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
}],
"ParameterType": "Convolution",
"Kernel": [1,1],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [0,0],
"PostPadding": [0,0],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Float", "Count": 32768},
"Bias": {"Type": "Float", "Count": 256},
"HasSparseWeights": 0,
"HasDynamicFilter": 0,
"HasDynamicBias": 0,
"HasResidual": 0,
"ConvXAsActInputIdx": -1,
"BiasAsActInputIdx": -1,
"ResAsActInputIdx": -1,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_simple_t1r1s1",
"TacticValue": "0x9dece0dc37e90462"
},{
"Name": "Reformatting CopyNode for Output Tensor 0 to Conv_144 + Relu_145",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "Reformatted Output Tensor 0 to Conv_144 + Relu_145",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
}],
"Outputs": [
{
"Name": "middle",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP16 format"
}],
"ParameterType": "Reformat",
"Origin": "REFORMAT",
"TacticValue": "0x00000000000003ea"
},{
"Name": "parent.decoder.backbone.blocks.1.0.weight + QuantizeLinear_85 + Conv_87 + Relu_88",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "604",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "617",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [2,2],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 294912},
"Bias": {"Type": "Float", "Count": 256},
"HasSparseWeights": 0,
"HasDynamicFilter": 0,
"HasDynamicBias": 0,
"HasResidual": 0,
"ConvXAsActInputIdx": -1,
"BiasAsActInputIdx": -1,
"ResAsActInputIdx": -1,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.backbone.blocks.1.3.weight + QuantizeLinear_96 + Conv_98 + Relu_99",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "617",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "630",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 589824},
"Bias": {"Type": "Float", "Count": 256},
"HasSparseWeights": 0,
"HasDynamicFilter": 0,
"HasDynamicBias": 0,
"HasResidual": 0,
"ConvXAsActInputIdx": -1,
"BiasAsActInputIdx": -1,
"ResAsActInputIdx": -1,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.backbone.blocks.1.6.weight + QuantizeLinear_107 + Conv_109 + Relu_110",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "630",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "643",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 589824},
"Bias": {"Type": "Float", "Count": 256},
"HasSparseWeights": 0,
"HasDynamicFilter": 0,
"HasDynamicBias": 0,
"HasResidual": 0,
"ConvXAsActInputIdx": -1,
"BiasAsActInputIdx": -1,
"ResAsActInputIdx": -1,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.backbone.blocks.1.9.weight + QuantizeLinear_118 + Conv_120 + Relu_121",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "643",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "656",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 589824},
"Bias": {"Type": "Float", "Count": 256},
"HasSparseWeights": 0,
"HasDynamicFilter": 0,
"HasDynamicBias": 0,
"HasResidual": 0,
"ConvXAsActInputIdx": -1,
"BiasAsActInputIdx": -1,
"ResAsActInputIdx": -1,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.backbone.blocks.1.12.weight + QuantizeLinear_129 + Conv_131 + Relu_132",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "656",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "669",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 589824},
"Bias": {"Type": "Float", "Count": 256},
"HasSparseWeights": 0,
"HasDynamicFilter": 0,
"HasDynamicBias": 0,
"HasResidual": 0,
"ConvXAsActInputIdx": -1,
"BiasAsActInputIdx": -1,
"ResAsActInputIdx": -1,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.backbone.blocks.1.15.weight + QuantizeLinear_140 + Conv_142 + Relu_143",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "669",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "684",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 589824},
"Bias": {"Type": "Float", "Count": 256},
"HasSparseWeights": 0,
"HasDynamicFilter": 0,
"HasDynamicBias": 0,
"HasResidual": 0,
"ConvXAsActInputIdx": -1,
"BiasAsActInputIdx": -1,
"ResAsActInputIdx": -1,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage4_warpsize2x2x1_g1_tensor16x8x32_t1r3s3",
"TacticValue": "0xea88b51105501f96"
},{
"Name": "parent.decoder.neck.deblocks.1.0.weight + QuantizeLinear_153 + ConvTranspose_155",
"LayerType": "CaskDeconvolution",
"Inputs": [
{
"Name": "684",
"Location": "Device",
"Dimensions": [1,256,90,90],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"Outputs": [
{
"Name": "693",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP32"
}],
"ParameterType": "Convolution",
"Kernel": [2,2],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [0,0],
"PostPadding": [0,0],
"Stride": [2,2],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Int8", "Count": 262144},
"Bias": {"Type": "Float", "Count": 0},
"HasSparseWeights": 0,
"HasDynamicFilter": 0,
"HasDynamicBias": 0,
"HasResidual": 0,
"ConvXAsActInputIdx": -1,
"BiasAsActInputIdx": -1,
"ResAsActInputIdx": -1,
"Activation": "NONE",
"HasBias": 0,
"TacticName": "sm80_xmma_fprop_implicit_gemm_interleaved_i8f32_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize256x64x64_stage4_warpsize4x1x1_g1_tensor16x8x32_simple_t1r1s1",
"TacticValue": "0x1e55f8b415964e81"
},{
"Name": "BatchNormalization_156 + Relu_157",
"LayerType": "Scale",
"Inputs": [
{
"Name": "693",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP32"
}],
"Outputs": [
{
"Name": "Reformatted Output Tensor 0 to BatchNormalization_156 + Relu_157",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP32"
}],
"ParameterType": "Scale",
"Mode": "CHANNEL",
"Shift": {"Type": "Float", "Count": 256},
"Scale": {"Type": "Float", "Count": 256},
"Power": {"Type": "Float", "Count": 0},
"Activation": "RELU",
"ChannelAxis": 1,
"TacticValue": "0x0000000000000000"
},{
"Name": "Reformatting CopyNode for Output Tensor 0 to BatchNormalization_156 + Relu_157",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "Reformatted Output Tensor 0 to BatchNormalization_156 + Relu_157",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP32"
}],
"Outputs": [
{
"Name": "middle",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP16 format"
}],
"ParameterType": "Reformat",
"Origin": "REFORMAT",
"TacticValue": "0x00000000000003e8"
}],
"Bindings": ["camera"
,"lidar"
,"middle"
]}
I notice there is a difference at this layer in these json files:
{
"Name": "Reformatting CopyNode for Input Tensor 0 to Conv_144 + Relu_145",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "601",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major FP16 format"
}],
"Outputs": [
{
"Name": "Reformatted Input Tensor 0 to Conv_144 + Relu_145",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
}],
"ParameterType": "Reformat",
"Origin": "REFORMAT",
"TacticValue": "0x00000000000003ea"
},{
"Name": "Conv_144 + Relu_145",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "Reformatted Input Tensor 0 to Conv_144 + Relu_145",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
}],
"Outputs": [
{
"Name": "Reformatted Output Tensor 0 to Conv_144 + Relu_145",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
}],
"ParameterType": "Convolution",
"Kernel": [1,1],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [0,0],
"PostPadding": [0,0],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Float", "Count": 32768},
"Bias": {"Type": "Float", "Count": 256},
"HasSparseWeights": 0,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r1s1",
"TacticValue": "0x130df49cb195156b"
}
{
"Name": "Reformatting CopyNode for Input Tensor 0 to Conv_144 + Relu_145",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "601",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major FP16 format"
}],
"Outputs": [
{
"Name": "Reformatted Input Tensor 0 to Conv_144 + Relu_145",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
}],
"ParameterType": "Reformat",
"Origin": "REFORMAT",
"TacticValue": "0x00000000000003ea"
},{
"Name": "Conv_144 + Relu_145",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "Reformatted Input Tensor 0 to Conv_144 + Relu_145",
"Location": "Device",
"Dimensions": [1,128,180,180],
"Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
}],
"Outputs": [
{
"Name": "Reformatted Output Tensor 0 to Conv_144 + Relu_145",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Channel major FP32 format where channel % 4 == 0"
}],
"ParameterType": "Convolution",
"Kernel": [1,1],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [0,0],
"PostPadding": [0,0],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 256,
"Groups": 1,
"Weights": {"Type": "Float", "Count": 32768},
"Bias": {"Type": "Float", "Count": 256},
"HasSparseWeights": 0,
"HasDynamicFilter": 0,
"HasDynamicBias": 0,
"HasResidual": 0,
"ConvXAsActInputIdx": -1,
"BiasAsActInputIdx": -1,
"ResAsActInputIdx": -1,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_simple_t1r1s1",
"TacticValue": "0x9dece0dc37e90462"
}
There difference is in the Tactic:
TensorRT8.4: "sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r1s1"
TensorRT8.5: "sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_simple_t1r1s1"
TensorRT8.5 also use additional parameters like "HasDynamicFilter", "HasDynamicBias", "HasResidual",
Please help me to understand if this “simple” in tactic implementation makes the inference result for version 8.5 correct and what could be done to make version 8.4 get the same output inference like version 8.5. Thank you very much.
I figure out that the first layer conversion does not work on TensorRT 8.4, after the conversion, it is still FP16:
{"Layers": [{
"Name": "QuantizeLinear_3_clone_1",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "lidar",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP16 format"
}],
"Outputs": [
{
"Name": "Concat_0_lidar_clone_1",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP16 format"
}],
"ParameterType": "Reformat",
"Origin": "QDQ",
"TacticValue": "0x0000000000000000"
}
While it works in TensorRT 8.5:
{"Layers": [{
"Name": "QuantizeLinear_3_clone_1",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "lidar",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Row major linear FP16 format"
}],
"Outputs": [
{
"Name": "513",
"Location": "Device",
"Dimensions": [1,256,180,180],
"Format/Datatype": "Thirty-two wide channel vectorized row major Int8 format"
}],
"ParameterType": "Reformat",
"Origin": "QDQ",
"TacticValue": "0x00000000000003ea"
}
It seems to be a problem of QDQ to convert from fp16 to int8 in TensorRT 8.4 version for this model
so I decide to convert the fp16 model instead and now the output of the model is correct for both 8.4 and 8.5
Hi,
Thanks for the update.
It looks like you can get it to work after switching to fp16 mode.
Is that correct?
Thanks.
Yes, the fp16 model can work with TensorRT 8.4. Thanks
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.