Concat in Caffe parser is wrong when working with int8 calibration


I am converting my caffe model to tensorrt. When using int8 calibration, the output from concat layer is wrong. For example,

layer {
name: “concat_all_values”
type: “Concat”
bottom: “values1”
bottom: “values2”
bottom: “values3”
bottom: “values4”
bottom: “values5”
top: “all_vales”
concat_param {
axis: 1

layer {
name: ‘DoSomething’
type: ‘IPlugin’
bottom: “all_values”
bottom: “values1”
top: ‘output’
where values<1-5> batch_size x 1 x 1 tensors.

When I read inputs in DoSomething layer, the input “all_values” is wrong while values one has correct value.


TensorRT Version: 7.1.3-1
GPU Type: 2080ti
Nvidia Driver Version: 440.33.01
CUDA Version: 10.2
CUDNN Version: 8.0.1
Operating System + Version: ubuntu16.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi @hl2997,
Caffe to TRT conversion is deprecated from TRT<=7.
Recommend you to try caffe << ONNX << TRT.

If the issue persist, please share your model so that we can try reproducing the same.

I’m using TensorRT and I’ve observed the same issue with darknet (YOLOv4) -> ONNX -> TensorRT. In the SPP module, 4 tensors from previous layers are concat’ed together. The incorrect computation of INT8 “concat” results in very bad detection outputs.

If I use the same code to convert a YOLOv3 model to TensorRT INT8, the result is good. mAP of the INT8 engine is less than 1% different from the FP16 and FP32 engines. YOLOv3 does not have the SPP (concat) module.

Input tensors to the “concat” usually have different dynamic ranges. They could not be concatenated directly as INT8 values. I guess current TensorRT release does not handle that correctly.

@AakankshaS, could you help to check again? Thanks.

@AakankshaS Sorry, I further checked the INT8 calibration cache file of the YOLOv4 model. I saw that all inputs and output of a particular “Concat” layer have the same calibration value (or dynamic range). I think that’s the correct behavior.

So the problem I saw was not caused by “Concat” layers in a INT8 TensorRT engine. It should be something else…

@jkjung13 Have you solved the problem yet? I still think there’s something wrong with concat. Below is the snippet of the prototxt I use

layer {
name: “input”
type: “Concat”
bottom: “input1”
bottom: “input2”
bottom: “input3”
bottom: “input4”
bottom: “input5”
top: “input”
concat_param {
axis: 1

layer {
name: ‘SomeCutomOperation’
type: ‘IPlugin’
bottom: ‘input’
top: “output”

where each input[number] is just a scalar.
While doing calibration, I print out “input” in SomeCustomOperation, and the result is wrong. However, if I replace “input” with one of “input[number]”, the result is correct. @AakankshaS Could you please help me?

@hl2997 I still have the problem of INT8 (model accuracy) performing much worse than FP16 on one of the models I use. But as stated earlier, I carefully checked inputs/outputs of the “Concat” layers in the calibration cache file. I think TensorRT behaves correctly in that part. So I think the problem is not due to INT8 Concat.

More specifically, I still have problem with my INT8 TensorRT engine for the “yolov4-608” model. The original model is in darknet format. The model is first converted to ONNX then optimized with TensorRT. I shared all my source code at jkjung-avt/tensorrt_demos. If you refer to Demo #6: Using INT8 and DLA core, you could see that the “yolov4-608” INT8 engine has a much lower mAP (0.317 / 0.507) than FP16 (0.488 / 0.736).

any progress on this? I get the same behaviour with yolov4-736

UFF and Caffe Parser have been deprecated from TensorRT 7 onwards, hence request you to try ONNX parser.
Please check the below link for the same.


Hey sorry I wasn’t clear enough.
I’m using an onnx model, not UFF or Caffe for the calibration, I attached the calibration code if it helps.
Also I’m creating the the engine using deepstream nvinfer, I’m not really sure what it does behind the scenes but wouldn’t be the same as the link you sent?
I’m getting 5+% loss in both recall and precision with int8 quantisation compared to fp16.
The general flow is use tensorrt to get calibration cache, then use the same onnx in deepstream and calibration cache and let deepstream create the engine.
In both int8 and fp16 deepstream creates the engine. (7.7 KB)