The same performance with int8 and fp16

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Jetson NX
• DeepStream Version 5.0
• JetPack Version (valid for Jetson only) 4.4 DP
• TensorRT Version 7.1
• NVIDIA GPU Driver Version (valid for GPU only)

I am running the sample in /opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_SSD
with both int8 and fp16 mode, batch = 1. DLA not used.
I use 15W 6CORE power mode.

Both of the detection results are correct. I expect the int8 performance will be higher than fp16.

However, I found int8 and fp16 shows the similar performance, both is around 30fps.

Could you let me know why int8 has the same perf with fp16? and how to achieve higher fps with int8 than fp16?

Hi @andy.linluo
As noted in README under objectDetector_SSD, did you provide INT8 calib? And, how you make the change for INT8 and FP16 respectively? Could you elaborate ?

NOTE: To use INT8 mode, the INT8 calibration file for the SSD model needs to be
provided along with changing the network-mode to 1 in config_infer_primary_ssd.txt.
Refer to sampleUffSSD for running the sample in INT8 mode. The sample writes the
calibration cache to file "CalibrationTableSSD".

If you did’t provide INT8 calib, it will fall back to FP16, you should be able to find the related log in the output

@mchi, I did generate int8 calib file with
./sample_uff_ssd --int8

I randomly selected 500 images from coco validation dataset for calibration and successfully generate int8 calib cache. I believe it is correct because the output with sample images is correct.

Then I used the following command to generate the int8 engine file.
trtexec --uff=sample_ssd_relu6.uff --output=NMS --uffInput=Input,3,300,300 --int8 --calib=CalibrationTableUffSSD --saveEngine=uffssd_int8_b1.engine

I found the size of the int8 engine file is a little larger than the fp16 engine file generated by deepstream. Is it normal? I assume int8 engine file should be much smaller than fp16 engine file.

Is there anything I did wrong?

Hi @andy.linluo
Sorry for late!
I checked the perf on NX + JP4.4DP release, I can reproduce the issue : fp16 has the almost same perf as int8

$ ./trtexec --uff=sample_ssd_relu6.uff --output=NMS --uffInput=Input,3,300,300 --fp16 --workspace=450 --batch=1

[05/28/2020-11:58:33] [I] mean: 8.09466 ms (end to end 8.10616 ms)

./trtexec --uff=sample_ssd_relu6.uff --output=NMS --uffInput=Input,3,300,300 --int8 --workspace=450 --batch=1

[05/28/2020-12:04:35] [I] median: 7.68774 ms (end to end 7.69861 ms)

With “–dumpProfile” in the trtexec command as below, from the profile output, we can find that most of the inference time is on NMS plugin which time should be the same for FP16 and INT8, so we got the similar perf with FP16 and INT8.

$ ./trtexec --uff=sample_ssd_relu6.uff --output=NMS --uffInput=Input,3,300,300 --int8 --workspace=450 --batch=1 --dumpProfile

[05/28/2020-12:12:03] [I] === Profile (401 iterations ) ===
[05/28/2020-12:12:03] [I] Layer Time (ms) Avg. Time (ms) Time %
[05/28/2020-12:12:03] [I] GridAnchor 20.39 0.05 0.6
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Conv2d_1a_7x7/separable_conv2d/depthwise 176.52 0.44 5.6
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Conv2d_1a_7x7/separable_conv2d + FeatureExtractor/InceptionV2/InceptionV2/Conv2d_1a_7x7/Relu6 input reformatter 0 24.71 0.06 0.8
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Conv2d_1a_7x7/separable_conv2d + FeatureExtractor/InceptionV2/InceptionV2/Conv2d_1a_7x7/Relu6 35.42 0.09 1.1
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/MaxPool_2a_3x3/MaxPool 23.15 0.06 0.7
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Conv2d_2b_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Conv2d_2b_1x1/Relu6 17.23 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Conv2d_2c_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Conv2d_2c_3x3/Relu6 61.92 0.15 2.0
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/MaxPool_3a_3x3/MaxPool 21.36 0.05 0.7
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_3b/Branch_3/AvgPool_0a_3x3/AvgPool input reformatter 0 20.81 0.05 0.7
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_3b/Branch_3/AvgPool_0a_3x3/AvgPool 12.54 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_3b/Branch_3/Conv2d_0b_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_3b/Branch_3/Conv2d_0b_1x1/Relu6 10.57 0.03 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_3b/Branch_2/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_3b/Branch_2/Conv2d_0a_1x1/Relu6 10.99 0.03 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_3b/Branch_1/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_3b/Branch_1/Conv2d_0a_1x1/Relu6 6.61 0.02 0.2
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_3b/Branch_0/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_3b/Branch_0/Conv2d_0a_1x1/Relu6 9.36 0.02 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_3b/Branch_1/Conv2d_0b_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_3b/Branch_1/Conv2d_0b_3x3/Relu6 9.31 0.02 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_3b/Branch_2/Conv2d_0b_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_3b/Branch_2/Conv2d_0b_3x3/Relu6 14.86 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_3b/Branch_2/Conv2d_0c_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_3b/Branch_2/Conv2d_0c_3x3/Relu6 18.05 0.05 0.6
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_3c/Branch_3/AvgPool_0a_3x3/AvgPool input reformatter 0 28.37 0.07 0.9
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_3c/Branch_3/AvgPool_0a_3x3/AvgPool 15.86 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_3c/Branch_3/Conv2d_0b_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_3c/Branch_3/Conv2d_0b_1x1/Relu6 13.24 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_3c/Branch_2/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_3c/Branch_2/Conv2d_0a_1x1/Relu6 12.42 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_3c/Branch_1/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_3c/Branch_1/Conv2d_0a_1x1/Relu6 8.58 0.02 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_3c/Branch_0/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_3c/Branch_0/Conv2d_0a_1x1/Relu6 10.66 0.03 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_3c/Branch_1/Conv2d_0b_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_3c/Branch_1/Conv2d_0b_3x3/Relu6 14.57 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_3c/Branch_2/Conv2d_0b_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_3c/Branch_2/Conv2d_0b_3x3/Relu6 12.55 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_3c/Branch_2/Conv2d_0c_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_3c/Branch_2/Conv2d_0c_3x3/Relu6 17.80 0.04 0.6
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4a/Branch_2/MaxPool_1a_3x3/MaxPool 10.78 0.03 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4a/Branch_1/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4a/Branch_1/Conv2d_0a_1x1/Relu6 10.96 0.03 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4a/Branch_0/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4a/Branch_0/Conv2d_0a_1x1/Relu6 10.78 0.03 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4a/Branch_0/Conv2d_1a_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4a/Branch_0/Conv2d_1a_3x3/Relu6 13.94 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4a/Branch_1/Conv2d_0b_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4a/Branch_1/Conv2d_0b_3x3/Relu6 14.24 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4a/Branch_1/Conv2d_1a_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4a/Branch_1/Conv2d_1a_3x3/Relu6 11.65 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4b/Branch_3/AvgPool_0a_3x3/AvgPool input reformatter 0 16.71 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4b/Branch_3/AvgPool_0a_3x3/AvgPool 11.46 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4b/Branch_3/Conv2d_0b_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4b/Branch_3/Conv2d_0b_1x1/Relu6 16.10 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4b/Branch_2/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4b/Branch_2/Conv2d_0a_1x1/Relu6 10.31 0.03 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4b/Branch_1/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4b/Branch_1/Conv2d_0a_1x1/Relu6 8.93 0.02 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4b/Branch_0/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4b/Branch_0/Conv2d_0a_1x1/Relu6 9.96 0.02 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4b/Branch_1/Conv2d_0b_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4b/Branch_1/Conv2d_0b_3x3/Relu6 11.98 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4b/Branch_2/Conv2d_0b_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4b/Branch_2/Conv2d_0b_3x3/Relu6 11.60 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4b/Branch_2/Conv2d_0c_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4b/Branch_2/Conv2d_0c_3x3/Relu6 13.63 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4c/Branch_3/AvgPool_0a_3x3/AvgPool input reformatter 0 16.54 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4c/Branch_3/AvgPool_0a_3x3/AvgPool 11.48 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4c/Branch_3/Conv2d_0b_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4c/Branch_3/Conv2d_0b_1x1/Relu6 16.01 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4c/Branch_2/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4c/Branch_2/Conv2d_0a_1x1/Relu6 10.22 0.03 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4c/Branch_1/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4c/Branch_1/Conv2d_0a_1x1/Relu6 9.30 0.02 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4c/Branch_0/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4c/Branch_0/Conv2d_0a_1x1/Relu6 9.71 0.02 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4c/Branch_1/Conv2d_0b_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4c/Branch_1/Conv2d_0b_3x3/Relu6 14.03 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4c/Branch_2/Conv2d_0b_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4c/Branch_2/Conv2d_0b_3x3/Relu6 11.64 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4c/Branch_2/Conv2d_0c_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4c/Branch_2/Conv2d_0c_3x3/Relu6 13.69 0.03 0.4
[05/28/2020-12:12:03] [I] BoxPredictor_0/ClassPredictor/Conv2D || BoxPredictor_0/BoxEncodingPredictor/Conv2D input reformatter 0 16.51 0.04 0.5
[05/28/2020-12:12:03] [I] BoxPredictor_0/ClassPredictor/Conv2D || BoxPredictor_0/BoxEncodingPredictor/Conv2D 193.88 0.48 6.2
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4d/Branch_3/AvgPool_0a_3x3/AvgPool input reformatter 0 16.51 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4d/Branch_3/AvgPool_0a_3x3/AvgPool 11.76 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4d/Branch_3/Conv2d_0b_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4d/Branch_3/Conv2d_0b_1x1/Relu6 15.92 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4d/Branch_2/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4d/Branch_2/Conv2d_0a_1x1/Relu6 13.02 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4d/Branch_1/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4d/Branch_1/Conv2d_0a_1x1/Relu6 9.73 0.02 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4d/Branch_0/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4d/Branch_0/Conv2d_0a_1x1/Relu6 9.27 0.02 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4d/Branch_1/Conv2d_0b_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4d/Branch_1/Conv2d_0b_3x3/Relu6 14.26 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4d/Branch_2/Conv2d_0b_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4d/Branch_2/Conv2d_0b_3x3/Relu6 16.83 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4d/Branch_2/Conv2d_0c_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4d/Branch_2/Conv2d_0c_3x3/Relu6 16.80 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4e/Branch_3/AvgPool_0a_3x3/AvgPool input reformatter 0 17.19 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4e/Branch_3/AvgPool_0a_3x3/AvgPool 11.75 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4e/Branch_3/Conv2d_0b_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4e/Branch_3/Conv2d_0b_1x1/Relu6 15.56 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4e/Branch_2/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4e/Branch_2/Conv2d_0a_1x1/Relu6 10.11 0.03 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4e/Branch_1/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4e/Branch_1/Conv2d_0a_1x1/Relu6 10.11 0.03 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4e/Branch_0/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4e/Branch_0/Conv2d_0a_1x1/Relu6 8.91 0.02 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4e/Branch_1/Conv2d_0b_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4e/Branch_1/Conv2d_0b_3x3/Relu6 14.28 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4e/Branch_2/Conv2d_0b_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4e/Branch_2/Conv2d_0b_3x3/Relu6 19.41 0.05 0.6
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_4e/Branch_2/Conv2d_0c_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_4e/Branch_2/Conv2d_0c_3x3/Relu6 19.48 0.05 0.6
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5a/Branch_2/MaxPool_1a_3x3/MaxPool 7.34 0.02 0.2
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5a/Branch_1/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5a/Branch_1/Conv2d_0a_1x1/Relu6 9.23 0.02 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5a/Branch_0/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5a/Branch_0/Conv2d_0a_1x1/Relu6 9.56 0.02 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5a/Branch_0/Conv2d_1a_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5a/Branch_0/Conv2d_1a_3x3/Relu6 13.94 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5a/Branch_1/Conv2d_0b_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5a/Branch_1/Conv2d_0b_3x3/Relu6 19.38 0.05 0.6
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5a/Branch_1/Conv2d_1a_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5a/Branch_1/Conv2d_1a_3x3/Relu6 23.69 0.06 0.8
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5b/Branch_3/AvgPool_0a_3x3/AvgPool input reformatter 0 9.78 0.02 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5b/Branch_3/AvgPool_0a_3x3/AvgPool 8.93 0.02 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5b/Branch_3/Conv2d_0b_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5b/Branch_3/Conv2d_0b_1x1/Relu6 14.03 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5b/Branch_3/Conv2d_0b_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5b/Branch_3/Conv2d_0b_1x1/Relu6 output reformatter 0 2.84 0.01 0.1
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5b/Branch_2/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5b/Branch_2/Conv2d_0a_1x1/Relu6 13.31 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5b/Branch_1/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5b/Branch_1/Conv2d_0a_1x1/Relu6 16.29 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5b/Branch_0/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5b/Branch_0/Conv2d_0a_1x1/Relu6 15.76 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5b/Branch_1/Conv2d_0b_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5b/Branch_1/Conv2d_0b_3x3/Relu6 21.31 0.05 0.7
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5b/Branch_2/Conv2d_0b_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5b/Branch_2/Conv2d_0b_3x3/Relu6 16.29 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5b/Branch_2/Conv2d_0c_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5b/Branch_2/Conv2d_0c_3x3/Relu6 21.49 0.05 0.7
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5c/Branch_3/MaxPool_0a_3x3/MaxPool 8.43 0.02 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5c/Branch_3/Conv2d_0b_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5c/Branch_3/Conv2d_0b_1x1/Relu6 16.66 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5c/Branch_2/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5c/Branch_2/Conv2d_0a_1x1/Relu6 15.32 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5c/Branch_1/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5c/Branch_1/Conv2d_0a_1x1/Relu6 14.15 0.04 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5c/Branch_0/Conv2d_0a_1x1/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5c/Branch_0/Conv2d_0a_1x1/Relu6 13.45 0.03 0.4
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5c/Branch_1/Conv2d_0b_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5c/Branch_1/Conv2d_0b_3x3/Relu6 21.46 0.05 0.7
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5c/Branch_2/Conv2d_0b_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5c/Branch_2/Conv2d_0b_3x3/Relu6 19.43 0.05 0.6
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/InceptionV2/Mixed_5c/Branch_2/Conv2d_0c_3x3/Conv2D + FeatureExtractor/InceptionV2/InceptionV2/Mixed_5c/Branch_2/Conv2d_0c_3x3/Relu6 21.73 0.05 0.7
[05/28/2020-12:12:03] [I] BoxPredictor_1/ClassPredictor/Conv2D || BoxPredictor_1/BoxEncodingPredictor/Conv2D input reformatter 0 9.76 0.02 0.3
[05/28/2020-12:12:03] [I] BoxPredictor_1/ClassPredictor/Conv2D || BoxPredictor_1/BoxEncodingPredictor/Conv2D 225.43 0.56 7.2
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/Mixed_5c_1_Conv2d_2_1x1_256/Conv2D + FeatureExtractor/InceptionV2/Mixed_5c_1_Conv2d_2_1x1_256/Relu6 15.23 0.04 0.5
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/Mixed_5c_2_Conv2d_2_3x3_s2_512/Conv2D + FeatureExtractor/InceptionV2/Mixed_5c_2_Conv2d_2_3x3_s2_512/Relu6 26.72 0.07 0.8
[05/28/2020-12:12:03] [I] BoxPredictor_2/ClassPredictor/Conv2D || BoxPredictor_2/BoxEncodingPredictor/Conv2D input reformatter 0 4.26 0.01 0.1
[05/28/2020-12:12:03] [I] BoxPredictor_2/ClassPredictor/Conv2D || BoxPredictor_2/BoxEncodingPredictor/Conv2D 116.15 0.29 3.7
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/Mixed_5c_1_Conv2d_3_1x1_128/Conv2D + FeatureExtractor/InceptionV2/Mixed_5c_1_Conv2d_3_1x1_128/Relu6 9.17 0.02 0.3
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/Mixed_5c_2_Conv2d_3_3x3_s2_256/Conv2D + FeatureExtractor/InceptionV2/Mixed_5c_2_Conv2d_3_3x3_s2_256/Relu6 16.25 0.04 0.5
[05/28/2020-12:12:03] [I] BoxPredictor_3/ClassPredictor/Conv2D || BoxPredictor_3/BoxEncodingPredictor/Conv2D input reformatter 0 2.59 0.01 0.1
[05/28/2020-12:12:03] [I] BoxPredictor_3/ClassPredictor/Conv2D || BoxPredictor_3/BoxEncodingPredictor/Conv2D 55.22 0.14 1.8
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/Mixed_5c_1_Conv2d_4_1x1_128/Conv2D + FeatureExtractor/InceptionV2/Mixed_5c_1_Conv2d_4_1x1_128/Relu6 5.28 0.01 0.2
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/Mixed_5c_2_Conv2d_4_3x3_s2_256/Conv2D + FeatureExtractor/InceptionV2/Mixed_5c_2_Conv2d_4_3x3_s2_256/Relu6 16.45 0.04 0.5
[05/28/2020-12:12:03] [I] BoxPredictor_4/ClassPredictor/Conv2D || BoxPredictor_4/BoxEncodingPredictor/Conv2D input reformatter 0 2.72 0.01 0.1
[05/28/2020-12:12:03] [I] BoxPredictor_4/ClassPredictor/Conv2D || BoxPredictor_4/BoxEncodingPredictor/Conv2D 54.06 0.13 1.7
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/Mixed_5c_1_Conv2d_5_1x1_64/Conv2D + FeatureExtractor/InceptionV2/Mixed_5c_1_Conv2d_5_1x1_64/Relu6 4.60 0.01 0.1
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/Mixed_5c_2_Conv2d_5_3x3_s2_128/Conv2D + FeatureExtractor/InceptionV2/Mixed_5c_2_Conv2d_5_3x3_s2_128/Relu6 input reformatter 0 2.53 0.01 0.1
[05/28/2020-12:12:03] [I] FeatureExtractor/InceptionV2/Mixed_5c_2_Conv2d_5_3x3_s2_128/Conv2D + FeatureExtractor/InceptionV2/Mixed_5c_2_Conv2d_5_3x3_s2_128/Relu6 18.81 0.05 0.6
[05/28/2020-12:12:03] [I] BoxPredictor_5/ClassPredictor/Conv2D || BoxPredictor_5/BoxEncodingPredictor/Conv2D 29.66 0.07 0.9
[05/28/2020-12:12:03] [I] (Unnamed Layer* 496) [Shuffle] + BoxPredictor_0/Reshape_1 11.98 0.03 0.4
[05/28/2020-12:12:03] [I] (Unnamed Layer* 504) [Shuffle] + BoxPredictor_1/Reshape_1 8.39 0.02 0.3
[05/28/2020-12:12:03] [I] (Unnamed Layer* 512) [Shuffle] + BoxPredictor_2/Reshape_1 4.29 0.01 0.1
[05/28/2020-12:12:03] [I] (Unnamed Layer* 520) [Shuffle] + BoxPredictor_3/Reshape_1 4.09 0.01 0.1
[05/28/2020-12:12:03] [I] (Unnamed Layer* 528) [Shuffle] + BoxPredictor_4/Reshape_1 2.45 0.01 0.1
[05/28/2020-12:12:03] [I] (Unnamed Layer* 536) [Shuffle] + BoxPredictor_5/Reshape_1 2.21 0.01 0.1
[05/28/2020-12:12:03] [I] (Unnamed Layer* 204) [Shuffle] + Squeeze 2.51 0.01 0.1
[05/28/2020-12:12:03] [I] (Unnamed Layer* 391) [Shuffle] + Squeeze_1 2.42 0.01 0.1
[05/28/2020-12:12:03] [I] (Unnamed Layer* 413) [Shuffle] + Squeeze_2 2.21 0.01 0.1
[05/28/2020-12:12:03] [I] (Unnamed Layer* 435) [Shuffle] + Squeeze_3 2.61 0.01 0.1
[05/28/2020-12:12:03] [I] (Unnamed Layer* 457) [Shuffle] + Squeeze_4 2.29 0.01 0.1
[05/28/2020-12:12:03] [I] (Unnamed Layer* 479) [Shuffle] + Squeeze_5 2.13 0.01 0.1
[05/28/2020-12:12:03] [I] concat_box_loc 11.32 0.03 0.4
[05/28/2020-12:12:03] [I] concat_box_conf 22.41 0.06 0.7
[05/28/2020-12:12:03] [I] GridAnchor copy 2.94 0.01 0.1
[05/28/2020-12:12:03] [I] GridAnchor_1 copy 2.76 0.01 0.1
[05/28/2020-12:12:03] [I] GridAnchor_2 copy 2.26 0.01 0.1
[05/28/2020-12:12:03] [I] GridAnchor_3 copy 2.22 0.01 0.1
[05/28/2020-12:12:03] [I] GridAnchor_4 copy 2.15 0.01 0.1
[05/28/2020-12:12:03] [I] GridAnchor_5 copy 1.94 0.00 0.1
[05/28/2020-12:12:03] [I] NMS 699.11 1.74 22.2
[05/28/2020-12:12:03] [I] Total 3148.12 7.85 100.0
[05/28/2020-12:12:03] [I]
&&&& PASSED TensorRT.trtexec # ./trtexec --uff=sample_ssd_relu6.uff --output=NMS --uffInput=Input,3,300,300 --int8 --workspace=450 --batch=1 --dumpProfile

@mchi, thanks.
Is there any way to optimize NMS plug with INT8? or NMS plug has to be fp16 or fp32?
More generally, are all the plugs for custom layer supposed to be fp16 or fp32?
Does it mean for some models, int8 may not be faster than fp16?

From TRT6, trt plugin supports to be implemented to support INT8, FP16 and fp32.
But, for NMS, it only supports FP32 now.

How many batch do you want to run? You could check the INT8 and FP16 perf with the real batch you want to run.

Thanks!

@mchi, where can I find the detailed info about trt plugins such as precision support?

you could look into the code of TRT plugin - https://github.com/NVIDIA/TensorRT/tree/master/plugin

@andy.linluo

Platform: Tesla T4
TRT verson: 7.0.0.11
Batch Size: 32
             Int8 one iteration     fp16 one iteration
total        20.18ms                27.40ms
NMS           7.22ms                 7.78ms
Without NMS  12.96ms                19.62ms

I have verified large batch scenarios like batch=32, and find ratio of FPS for int8 and fp16 is about int8/fp16 = 1.5 (excluding NMS), int8 is 50% faster.
However, there is still no significant difference of NMS between fp16 and int8.
NMS plugin would be improved to consider fp16 and int8 in the future.

So, consider using larger batch size if possible.

1 Like