Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 5.1
• JetPack Version (valid for Jetson only)
• TensorRT Version 7.2
• NVIDIA GPU Driver Version (valid for GPU only) 460.80
• Issue Type( questions, new requirements, bugs) Question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
I am running peoplenet directly out of the 5.1-21.02-devel container using the included deepstream_app_source1_peoplenet.txt. On our test videos, we have found that using the int8 quantized model over fp16 only yields about a 10% increase in fps on a single stream. This seems low but perhaps my expectations are incorrect - I was expecting a much bigger performance increase for int8. I have the following questions:
- Is this the expected performance increase for fp16 vs int8 on a single stream? If not, how much is expected?
- If so, is there a bigger performance increase to be expected from batching? If so, how much?
I am including my pgie configuration for int8 below for diagnostic purposes. In the fp16 setting, I am using the default config_infer_primary_peoplenet.txt included in the container with the fp16 pruned model. For int8, I am using the below config with the quantized pruned model.
[property] gpu-id=0 net-scale-factor=0.0039215697906911373 tlt-model-key=tlt_encode tlt-encoded-model=../../../models/tlt_pretrained_models/peoplenet-int8/resnet34_peoplenet_pruned_int8.etlt labelfile-path=../../../models/tlt_pretrained_models/peoplenet-int8/labels.txt model-engine-file=../../../models/tlt_pretrained_models/peoplenet-int8/resnet34_peoplenet_pruned_int8.etlt_b1_gpu0_int8.engine int8-calib-file=../../../models/tlt_pretrained_models/peoplenet-int8/resnet34_peoplenet_pruned_int8_gpu.txt input-dims=3;544;960;0 uff-input-blob-name=input_1 batch-size=1 process-mode=1 model-color-format=0 ## 0=FP32, 1=INT8, 2=FP16 mode network-mode=1 num-detected-classes=3 cluster-mode=1 interval=0 gie-unique-id=1 output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid [class-attrs-all] pre-cluster-threshold=0.4 ## Set eps=0.7 and minBoxes for cluster-mode=1(DBSCAN) eps=0.7 minBoxes=1