Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) RTX 2080
• DeepStream Version 5.0-dp
• TensorRT Version 7.0.0.11
• NVIDIA GPU Driver Version (valid for GPU only) 440.64.00
So far in my tests I haven’t been able to notice consistent gain or loss of performance from using classifier_async_mode=1 for secondary GIEs.
I’m running deepstream-app example with source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt config, except I modified [source0] and [sink0] sections of the config as follows:
[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=2
uri=file://../../streams/sample_1080p_h264.mp4
num-sources=1
#drop-frame-interval=2
gpu-id=0
# (0): memtype_device - Memory type Device
# (1): memtype_pinned - Memory type Host Pinned
# (2): memtype_unified - Memory type Unified
cudadec-memtype=0
[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=1
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0
I change whether to use classifier_async_mode directly in config_infer_secondary_carcolor.txt, config_infer_secondary_carmake.txt and config_infer_secondary_vehicletypes.txt config files.
Performance measurements are as reported by the sample app in terminal. Throughput remains very similar, sometimes increasing from 200FPS to 210FPS. Frame latency stays pretty much the same, while per component latency decreases for secondary GIEs and increases for src_bin_muxer
So, my question is what is the benefit of classifier_async_mode? Can I expect performance gains from it, especially throughput, and in what scenarios (I understand that it needs track ids, but perhaps some other pipeline elements are also needed)?