## Description
I used the pytorch_quantization toolkit to convert the Conv2d …layers in a fully convolutional network (source model [here](https://github.com/clovaai/CRAFT-pytorch)) to int8 and was able to successfully export the model to ONNX. When I attempt to convert the model to a trt engine while leaving all quantized Conv2d layers enabled, I get the following error at layer basenet.slice1.10:
```
[12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.10.weight + QuantizeLinear_44 + Conv_46 (CaskConvolution)
[12/08/2022-21:54:08] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping
[12/08/2022-21:54:08] [TRT] [E] 10: [optimizer.cpp::computeCosts::3626] Error Code 10: Internal Error (Could not find any implementation for node basenet.slice1.10.weight + QuantizeLinear_44 + Conv_46.)
[12/08/2022-21:54:08] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
```
A longer snippet of the logs shows that it was able to successfully convert earlier quantized Conv2d layers (basenet.slice.1.7 shown here) before failing:
```
[12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CaskConvolution)
[12/08/2022-21:54:08] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping
[12/08/2022-21:54:08] [TRT] [V] *************** Autotuning format combination: Int8(2359296,147456:4,288,1) -> Int8(4718592,147456:4,288,1) ***************
[12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CudaDepthwiseConvolution)
[12/08/2022-21:54:08] [TRT] [V] CudaDepthwiseConvolution has no valid tactics for this config, skipping
[12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (FusedConvActConvolution)
[12/08/2022-21:54:08] [TRT] [V] FusedConvActConvolution has no valid tactics for this config, skipping
[12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CaskConvolution)
[12/08/2022-21:54:08] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping
[12/08/2022-21:54:08] [TRT] [V] *************** Autotuning format combination: Int8(2359296,147456:4,288,1) -> Int8(589824,147456:32,288,1) ***************
[12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CaskConvolution)
[12/08/2022-21:54:08] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping
[12/08/2022-21:54:08] [TRT] [V] *************** Autotuning format combination: Int8(294912,147456:32,288,1) -> Int8(589824,147456:32,288,1) ***************
[12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CudaGroupConvolution)
[12/08/2022-21:54:08] [TRT] [V] CudaGroupConvolution has no valid tactics for this config, skipping
[12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CudaDepthwiseConvolution)
[12/08/2022-21:54:08] [TRT] [V] CudaDepthwiseConvolution has no valid tactics for this config, skipping
[12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (FusedConvActConvolution)
[12/08/2022-21:54:08] [TRT] [V] FusedConvActConvolution has no valid tactics for this config, skipping
[12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CaskConvolution)
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize256x128x64_stage1_warpsize4x2x1_g1_tensor8x8x16_t1r3s3 Tactic: 0x0405e3a763219823
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0x0405e3a763219823 Time: 0.311957
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x128x64_stage1_warpsize2x2x1_g1_tensor8x8x16_t1r3s3 Tactic: 0x09727a53770225e8
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0x09727a53770225e8 Time: 0.335883
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x32x64_stage1_warpsize2x1x1_g1_tensor8x8x16_t1r3s3 Tactic: 0x13463e9bf9ae0d73
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0x13463e9bf9ae0d73 Time: 0.557707
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage1_warpsize2x2x1_g1_tensor8x8x16 Tactic: 0x1d9b1bf0b28cc357
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0x1d9b1bf0b28cc357 Time: 0.279893
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize32x64x64_stage1_warpsize2x2x1_g1_tensor8x8x16 Tactic: 0x23cd610b930e6789
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0x23cd610b930e6789 Time: 0.654432
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x128x64_stage1_warpsize2x2x1_g1_tensor8x8x16_t1r3s3 Tactic: 0x3a7df5a005634aca
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0x3a7df5a005634aca Time: 0.276848
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize32x64x64_stage1_warpsize2x2x1_g1_tensor8x8x16_t1r3s3 Tactic: 0x3cda2ee55a7d0cc2
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0x3cda2ee55a7d0cc2 Time: 0.655232
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x256x64_stage1_warpsize2x4x1_g1_tensor8x8x16_t1r3s3 Tactic: 0x446f06d5a2e0bae3
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0x446f06d5a2e0bae3 Time: 0.561131
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize32x32x64_stage1_warpsize2x1x1_g1_tensor8x8x16_t1r3s3 Tactic: 0x4e4c4bf050b40a1b
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0x4e4c4bf050b40a1b Time: 0.836875
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize32x32x64_stage1_warpsize2x1x1_g1_tensor8x8x16 Tactic: 0x58be15b6f024df52
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0x58be15b6f024df52 Time: 0.847189
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x64x64_stage1_warpsize2x2x1_g1_tensor8x8x16_t1r3s3 Tactic: 0x61d05b8ef3670baa
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0x61d05b8ef3670baa Time: 0.450656
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x64x64_stage1_warpsize2x2x1_g1_tensor8x8x16 Tactic: 0x81994a658cdf908d
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0x81994a658cdf908d Time: 0.452651
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize256x128x64_stage1_warpsize4x2x1_g1_tensor8x8x16 Tactic: 0x85047b8e34ed27fa
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0x85047b8e34ed27fa Time: 0.343371
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize256x64x64_stage1_warpsize4x1x1_g1_tensor8x8x16_t1r3s3 Tactic: 0x8a60cb2150513f2e
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0x8a60cb2150513f2e Time: 0.340192
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x64x64_stage1_warpsize2x2x1_g1_tensor8x8x16_t1r3s3 Tactic: 0xa792e2a2dcc5e78f
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0xa792e2a2dcc5e78f Time: 0.367264
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize256x64x64_stage1_warpsize4x1x1_g1_tensor8x8x16 Tactic: 0xb81aeaba4cbc0d97
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0xb81aeaba4cbc0d97 Time: 0.338603
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x128x64_stage1_warpsize2x2x1_g1_tensor8x8x16 Tactic: 0xdd517393a24bd0f4
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0xdd517393a24bd0f4 Time: 0.381312
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x64x64_stage1_warpsize2x2x1_g1_tensor8x8x16 Tactic: 0xdfb027065697c23b
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0xdfb027065697c23b Time: 0.392992
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x32x64_stage1_warpsize2x1x1_g1_tensor8x8x16 Tactic: 0xfaea3ed8eff52856
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0xfaea3ed8eff52856 Time: 0.636875
[12/08/2022-21:54:08] [TRT] [V] basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 Set Tactic Name: sm75_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize128x256x64_stage1_warpsize2x4x1_g1_tensor8x8x16 Tactic: 0xfb1f0c938b867bc9
[12/08/2022-21:54:08] [TRT] [V] Tactic: 0xfb1f0c938b867bc9 Time: 0.666251
[12/08/2022-21:54:08] [TRT] [V] Fastest Tactic: 0x3a7df5a005634aca Time: 0.276848
[12/08/2022-21:54:08] [TRT] [V] >>>>>>>>>>>>>>> Chose Runner Type: CaskConvolution Tactic: 0x3a7df5a005634aca
[12/08/2022-21:54:08] [TRT] [V] =============== Computing costs for
[12/08/2022-21:54:08] [TRT] [V] *************** Autotuning format combination: Int8(4718592,147456:4,288,1) -> Float(18874368,147456,288,1) ***************
[12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.10.weight + QuantizeLinear_44 + Conv_46 (CudaDepthwiseConvolution)
[12/08/2022-21:54:08] [TRT] [V] CudaDepthwiseConvolution has no valid tactics for this config, skipping
[12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.10.weight + QuantizeLinear_44 + Conv_46 (CaskConvolution)
[12/08/2022-21:54:08] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping
[12/08/2022-21:54:08] [TRT] [V] *************** Autotuning format combination: Int8(589824,147456:32,288,1) -> Float(18874368,147456,288,1) ***************
[12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.10.weight + QuantizeLinear_44 + Conv_46 (CaskConvolution)
[12/08/2022-21:54:08] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping
[12/08/2022-21:54:08] [TRT] [V] *************** Autotuning format combination: Int8(589824,147456:32,288,1) -> Float(589824,147456:32,288,1) ***************
[12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.10.weight + QuantizeLinear_44 + Conv_46 (CaskConvolution)
[12/08/2022-21:54:08] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping
[12/08/2022-21:54:08] [TRT] [V] *************** Autotuning format combination: Int8(589824,147456:32,288,1) -> Half(589824,147456:32,288,1) ***************
[12/08/2022-21:54:08] [TRT] [V] --------------- Timing Runner: basenet.slice1.10.weight + QuantizeLinear_44 + Conv_46 (CaskConvolution)
[12/08/2022-21:54:08] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping
[12/08/2022-21:54:08] [TRT] [E] 10: [optimizer.cpp::computeCosts::3626] Error Code 10: Internal Error (Could not find any implementation for node basenet.slice1.10.weight + QuantizeLinear_44 + Conv_46.)
[12/08/2022-21:54:08] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
```
When I disable the quantized Conv2d layer "basenet.slice1.10", however, the conversion still fails and instead can't find an implementation for the node "basenet.slice1.7" now. It also seems to explore less tacics:
```
[12/08/2022-21:51:58] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CudaDepthwiseConvolution)
[12/08/2022-21:51:58] [TRT] [V] CudaDepthwiseConvolution has no valid tactics for this config, skipping
[12/08/2022-21:51:58] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CaskConvolution)
[12/08/2022-21:51:58] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping
[12/08/2022-21:51:58] [TRT] [V] *************** Autotuning format combination: Int8(294912,147456:32,288,1) -> Float(18874368,147456,288,1) ***************
[12/08/2022-21:51:58] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CaskConvolution)
[12/08/2022-21:51:58] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping
[12/08/2022-21:51:58] [TRT] [V] *************** Autotuning format combination: Int8(294912,147456:32,288,1) -> Float(589824,147456:32,288,1) ***************
[12/08/2022-21:51:58] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CaskConvolution)
[12/08/2022-21:51:58] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping
[12/08/2022-21:51:58] [TRT] [V] *************** Autotuning format combination: Int8(294912,147456:32,288,1) -> Half(589824,147456:32,288,1) ***************
[12/08/2022-21:51:58] [TRT] [V] --------------- Timing Runner: basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34 (CaskConvolution)
[12/08/2022-21:51:58] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping
[12/08/2022-21:51:59] [TRT] [E] 10: [optimizer.cpp::computeCosts::3626] Error Code 10: Internal Error (Could not find any implementation for node basenet.slice1.7.weight + QuantizeLinear_32 + Conv_34.)
[12/08/2022-21:51:59] [TRT] [E] 2: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
```
What I've tried to do:
- I validated that the model can be successfully converted from pytorch->ONNX->TRT without quantized layers
- I attempted to replicate the conversion process on an ampere A10 GPU, increased workspace size, and reduced height and width dimensions of inputs per suggestions in https://github.com/NVIDIA/TensorRT/issues/1768
- Tried to convert within the docker container nvcr.io/nvidia/tensorrt:22.07-py3 per the suggestion in https://github.com/NVIDIA/TensorRT/issues/2240
- Tried to build the engine without "OBEY_PRECISION_CONSTRAINTS" or "PREFER_PRECISION_CONSTRAINTS" builder flags
I see the same errors with despite these modifications. I can also successfully perform inference with my quantized model, though there is some performance difference compared to the quantized pytorch model, not sure if that could be related. Is there something obvious I'm missing in the logs, or a known way to remedy this issue?
## Environment
**TensorRT Version**: 8.4.2.1
**NVIDIA GPU**: T4
**NVIDIA Driver Version**: 510.47.03
**CUDA Version**: 11.5
**CUDNN Version**: 8.4
**Operating System**: ubuntu 20.04
**Python Version (if applicable)**: 3.9
**Tensorflow Version (if applicable)**:
**PyTorch Version (if applicable)**: 1.11.0
**Baremetal or Container (if so, version)**:
## Relevant Files
A zip file with the quantized onnx model I'm attempting to convert can be downloaded here: https://drive.google.com/file/d/1PfO7JWONrX4JHMxCCjxmr6-f4tNbUW5g/view