Understanding classifier_async_mode effect on performance

abramov_ov · July 2, 2020, 4:30am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) RTX 2080
• DeepStream Version 5.0-dp
• TensorRT Version 7.0.0.11
• NVIDIA GPU Driver Version (valid for GPU only) 440.64.00

So far in my tests I haven’t been able to notice consistent gain or loss of performance from using classifier_async_mode=1 for secondary GIEs.

I’m running deepstream-app example with source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt config, except I modified [source0] and [sink0] sections of the config as follows:

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=2
uri=file://../../streams/sample_1080p_h264.mp4
num-sources=1
#drop-frame-interval=2
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=1
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0

I change whether to use classifier_async_mode directly in config_infer_secondary_carcolor.txt, config_infer_secondary_carmake.txt and config_infer_secondary_vehicletypes.txt config files.

Performance measurements are as reported by the sample app in terminal. Throughput remains very similar, sometimes increasing from 200FPS to 210FPS. Frame latency stays pretty much the same, while per component latency decreases for secondary GIEs and increases for src_bin_muxer

So, my question is what is the benefit of classifier_async_mode? Can I expect performance gains from it, especially throughput, and in what scenarios (I understand that it needs track ids, but perhaps some other pipeline elements are also needed)?

mchi · July 5, 2020, 3:22pm

Hi @abramov_ov,
Is there tracker in your pipeline? Did you see log like “Untracked objects in metadata. Cannot infer on untracked objects in asynchronous mode.” ? Because the classifier cannot infer on untracked objects in asynchronous mode.

And, I think, if the bottleneck is on primary GIE and the classifier origianlly can process the classification in time, classifier_async_mode should not help improve the FPS.

Maybe you could refer to GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream to measure the FPS of primary GIE.

Thanks!