Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) GPU
• DeepStream Version 5.1
• JetPack Version (valid for Jetson only)
• TensorRT Version 7.2
• NVIDIA GPU Driver Version (valid for GPU only) 460.80
• Issue Type( questions, new requirements, bugs) Question
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
I am running peoplenet directly out of the 5.1-21.02-devel container using the included deepstream_app_source1_peoplenet.txt. On our test videos, we have found that using the int8 quantized model over fp16 only yields about a 10% increase in fps on a single stream. This seems low but perhaps my expectations are incorrect - I was expecting a much bigger performance increase for int8. I have the following questions:
- Is this the expected performance increase for fp16 vs int8 on a single stream? If not, how much is expected?
- If so, is there a bigger performance increase to be expected from batching? If so, how much?
I am including my pgie configuration for int8 below for diagnostic purposes. In the fp16 setting, I am using the default config_infer_primary_peoplenet.txt included in the container with the fp16 pruned model. For int8, I am using the below config with the quantized pruned model.
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
tlt-model-key=tlt_encode
tlt-encoded-model=../../../models/tlt_pretrained_models/peoplenet-int8/resnet34_peoplenet_pruned_int8.etlt
labelfile-path=../../../models/tlt_pretrained_models/peoplenet-int8/labels.txt
model-engine-file=../../../models/tlt_pretrained_models/peoplenet-int8/resnet34_peoplenet_pruned_int8.etlt_b1_gpu0_int8.engine
int8-calib-file=../../../models/tlt_pretrained_models/peoplenet-int8/resnet34_peoplenet_pruned_int8_gpu.txt
input-dims=3;544;960;0
uff-input-blob-name=input_1
batch-size=1
process-mode=1
model-color-format=0
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=1
num-detected-classes=3
cluster-mode=1
interval=0
gie-unique-id=1
output-blob-names=output_bbox/BiasAdd;output_cov/Sigmoid
[class-attrs-all]
pre-cluster-threshold=0.4
## Set eps=0.7 and minBoxes for cluster-mode=1(DBSCAN)
eps=0.7
minBoxes=1