I am converting my caffe model to tensorrt. When using int8 calibration, the output from concat layer is wrong. For example,
…
layer {
name: “concat_all_values”
type: “Concat”
bottom: “values1”
bottom: “values2”
bottom: “values3”
bottom: “values4”
bottom: “values5”
top: “all_vales”
concat_param {
axis: 1
}
}
layer {
name: ‘DoSomething’
type: ‘IPlugin’
bottom: “all_values”
bottom: “values1”
top: ‘output’
}
where values<1-5> batch_size x 1 x 1 tensors.
When I read inputs in DoSomething layer, the input “all_values” is wrong while values one has correct value.
Environment
TensorRT Version: 7.1.3-1 GPU Type: 2080ti Nvidia Driver Version: 440.33.01 CUDA Version: 10.2 CUDNN Version: 8.0.1 Operating System + Version: ubuntu16.04 Python Version (if applicable): TensorFlow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
I’m using TensorRT 7.1.3.4 and I’ve observed the same issue with darknet (YOLOv4) → ONNX → TensorRT. In the SPP module, 4 tensors from previous layers are concat’ed together. The incorrect computation of INT8 “concat” results in very bad detection outputs.
If I use the same code to convert a YOLOv3 model to TensorRT INT8, the result is good. mAP of the INT8 engine is less than 1% different from the FP16 and FP32 engines. YOLOv3 does not have the SPP (concat) module.
Input tensors to the “concat” usually have different dynamic ranges. They could not be concatenated directly as INT8 values. I guess current TensorRT release does not handle that correctly.
@AakankshaS, could you help to check again? Thanks.
@AakankshaS Sorry, I further checked the INT8 calibration cache file of the YOLOv4 model. I saw that all inputs and output of a particular “Concat” layer have the same calibration value (or dynamic range). I think that’s the correct behavior.
So the problem I saw was not caused by “Concat” layers in a INT8 TensorRT engine. It should be something else…
where each input[number] is just a scalar.
While doing calibration, I print out “input” in SomeCustomOperation, and the result is wrong. However, if I replace “input” with one of “input[number]”, the result is correct. @AakankshaS Could you please help me?
@hl2997 I still have the problem of INT8 (model accuracy) performing much worse than FP16 on one of the models I use. But as stated earlier, I carefully checked inputs/outputs of the “Concat” layers in the calibration cache file. I think TensorRT behaves correctly in that part. So I think the problem is not due to INT8 Concat.
More specifically, I still have problem with my INT8 TensorRT engine for the “yolov4-608” model. The original model is in darknet format. The model is first converted to ONNX then optimized with TensorRT. I shared all my source code at jkjung-avt/tensorrt_demos. If you refer to Demo #6: Using INT8 and DLA core, you could see that the “yolov4-608” INT8 engine has a much lower mAP (0.317 / 0.507) than FP16 (0.488 / 0.736).
Hey sorry I wasn’t clear enough.
I’m using an onnx model, not UFF or Caffe for the calibration, I attached the calibration code if it helps.
Also I’m creating the the engine using deepstream nvinfer, I’m not really sure what it does behind the scenes but wouldn’t be the same as the link you sent?
I’m getting 5+% loss in both recall and precision with int8 quantisation compared to fp16.
The general flow is use tensorrt to get calibration cache, then use the same onnx in deepstream and calibration cache and let deepstream create the engine.
In both int8 and fp16 deepstream creates the engine. files.zip (7.7 KB)