Concat in Caffe parser is wrong when working with int8 calibration

hl2997 · July 19, 2020, 11:09pm

Description

I am converting my caffe model to tensorrt. When using int8 calibration, the output from concat layer is wrong. For example,
…
layer {
name: “concat_all_values”
type: “Concat”
bottom: “values1”
bottom: “values2”
bottom: “values3”
bottom: “values4”
bottom: “values5”
top: “all_vales”
concat_param {
axis: 1
}
}

layer {
name: ‘DoSomething’
type: ‘IPlugin’
bottom: “all_values”
bottom: “values1”
top: ‘output’
}
where values<1-5> batch_size x 1 x 1 tensors.

When I read inputs in DoSomething layer, the input “all_values” is wrong while values one has correct value.

Environment

TensorRT Version: 7.1.3-1
GPU Type: 2080ti
Nvidia Driver Version: 440.33.01
CUDA Version: 10.2
CUDNN Version: 8.0.1
Operating System + Version: ubuntu16.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

AakankshaS · July 20, 2020, 7:45am

Hi @hl2997,
Caffe to TRT conversion is deprecated from TRT<=7.
Recommend you to try caffe << ONNX << TRT.

If the issue persist, please share your model so that we can try reproducing the same.
Thanks!

jkjung13 · September 27, 2020, 2:24am

I’m using TensorRT 7.1.3.4 and I’ve observed the same issue with darknet (YOLOv4) → ONNX → TensorRT. In the SPP module, 4 tensors from previous layers are concat’ed together. The incorrect computation of INT8 “concat” results in very bad detection outputs.

If I use the same code to convert a YOLOv3 model to TensorRT INT8, the result is good. mAP of the INT8 engine is less than 1% different from the FP16 and FP32 engines. YOLOv3 does not have the SPP (concat) module.

Input tensors to the “concat” usually have different dynamic ranges. They could not be concatenated directly as INT8 values. I guess current TensorRT release does not handle that correctly.

@AakankshaS, could you help to check again? Thanks.

jkjung13 · September 28, 2020, 2:24pm

@AakankshaS Sorry, I further checked the INT8 calibration cache file of the YOLOv4 model. I saw that all inputs and output of a particular “Concat” layer have the same calibration value (or dynamic range). I think that’s the correct behavior.

So the problem I saw was not caused by “Concat” layers in a INT8 TensorRT engine. It should be something else…

hl2997 · October 21, 2020, 6:05pm

@jkjung13 Have you solved the problem yet? I still think there’s something wrong with concat. Below is the snippet of the prototxt I use

layer {
name: “input”
type: “Concat”
bottom: “input1”
bottom: “input2”
bottom: “input3”
bottom: “input4”
bottom: “input5”
top: “input”
concat_param {
axis: 1
}
}

layer {
name: ‘SomeCutomOperation’
type: ‘IPlugin’
bottom: ‘input’
top: “output”
}

where each input[number] is just a scalar.
While doing calibration, I print out “input” in SomeCustomOperation, and the result is wrong. However, if I replace “input” with one of “input[number]”, the result is correct. @AakankshaS Could you please help me?

jkjung13 · October 22, 2020, 1:35am

@hl2997 I still have the problem of INT8 (model accuracy) performing much worse than FP16 on one of the models I use. But as stated earlier, I carefully checked inputs/outputs of the “Concat” layers in the calibration cache file. I think TensorRT behaves correctly in that part. So I think the problem is not due to INT8 Concat.

More specifically, I still have problem with my INT8 TensorRT engine for the “yolov4-608” model. The original model is in darknet format. The model is first converted to ONNX then optimized with TensorRT. I shared all my source code at jkjung-avt/tensorrt_demos. If you refer to Demo #6: Using INT8 and DLA core, you could see that the “yolov4-608” INT8 engine has a much lower mAP (0.317 / 0.507) than FP16 (0.488 / 0.736).

davidv1 · May 17, 2022, 2:00pm

any progress on this? I get the same behaviour with yolov4-736

NVES · May 17, 2022, 2:37pm

Hi,
UFF and Caffe Parser have been deprecated from TensorRT 7 onwards, hence request you to try ONNX parser.
Please check the below link for the same.

Thanks!

davidv1 · May 18, 2022, 4:46pm

Hey sorry I wasn’t clear enough.
I’m using an onnx model, not UFF or Caffe for the calibration, I attached the calibration code if it helps.
Also I’m creating the the engine using deepstream nvinfer, I’m not really sure what it does behind the scenes but wouldn’t be the same as the link you sent?
I’m getting 5+% loss in both recall and precision with int8 quantisation compared to fp16.
The general flow is use tensorrt to get calibration cache, then use the same onnx in deepstream and calibration cache and let deepstream create the engine.
In both int8 and fp16 deepstream creates the engine.
files.zip (7.7 KB)

Topic		Replies	Views
Error in converting caffe xilinx yolo v3 model to Tensorrt TensorRT tensorrt	7	798	October 1, 2021
Convert int8-onnx model to trt engine? TensorRT onnx	6	1080	April 29, 2023
Could not parse layer type Normalize TensorRT	3	666	November 17, 2021
Tenssorrt INT8 precision engine build failed for the models having custom layer (BatchedNMSDynamic_TRT) TensorRT	11	1919	June 29, 2021
TensorRT INT8 conversion from an ONNX model TensorRT tensorrt , calibration , onnx	4	5507	July 29, 2024
Uff to trt8 TensorRT tensorrt	6	866	January 19, 2022
ONNX Model INT8 Engine Build TensorRT tensorrt , jetson-inference , calibration , onnx	3	1923	July 26, 2022
TensorRT run ONNX model with Int8 issue TensorRT	9	4196	October 12, 2021
Int8 problem TensorRT tensorrt	19	1107	May 11, 2021
Driver error-TensorRT INT8 deploy TensorRT	3	699	November 20, 2020

Concat in Caffe parser is wrong when working with int8 calibration

Description

Environment

Relevant Files

Steps To Reproduce

Related topics