Low GPU Utilization during inference

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) GPU (GTX1070)
• DeepStream Version 5.0.1-20.09-triton
• TensorRT Version 7.0.0
• NVIDIA GPU Driver Version (valid for GPU only) 460.32.03
• Issue Type( questions, new requirements, bugs) questions

During inference with the Deepstream-app I only reach 60 FPS with my TensorRT optimized Model (Mobilenetv2, 300,300) and have a GPU usage of 30%.

In my config file I changed under the part “[sink0]” the option “sync=1” to “sync=0” to get the full computing power. The FPS jumps from 30 to 60 FPS. But I discovered with the nvidia-smi tool that the GPU Utilization is just 30%.
When I change under the part “[sink0]” the option “type” from 2 (EglSink) to 1 (FakeSink) I get nearly 300FPS for the model and a GPU Utilization of 95-100%.

I tried the tips from the official documentation but no luck. (https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_troubleshooting.html)

Can someone help me to figure out where the Problem for the low GPU Utilization is?

source_1080p_dec_infer_mobilenetv2_tf.txt (4.3 KB) )
config_infer_primary_ssd.txt (3.3 KB)

Thanks

Hey, have you measured the latency and check each component’s latency?

Thank you for the fast response @bcao.
No I don’t. I thought the latency measurement with the Latency Measurement API is just possible for live sources and RTSPStreaming and not for file sinks (option 3 for type in [sink0]).
See the following link Latency measurement issue - #3 by amycao.

Or is it now possible to use NVDS_ENABLE_LATENCY_MEASUREMENT and NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT also for File sinks?

Update

This post has the same issue Understanding When/Why DeepStream 5.0 caps the performance

  1. Results for NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1 and EglSink

BATCH-NUM = 4384**
Comp name = nvv4l2decoder0
component latency= 49.979004
Comp name = src_bin_muxer source_id = 0 pad_index = 0 frame_num = 4384 component_latency = 0.192871
Comp name = primary_gie
component latency= 55.223145
Comp name = tiled_display_tiler
component latency= 0.349121
Comp name = osd_conv
component latency= 1.545166
Comp name = nvosd0
component latency= 15.479004
Source id = 0 Frame_num = 4384 Frame latency = 134.345947 (ms)

  1. Results for NVDS_ENABLE_COMPONENT_LATENCY_MEASUREMENT=1 and FakeSink

BATCH-NUM = 4384**
Comp name = nvv4l2decoder0
component latency= 8.173096
Comp name = src_bin_muxer source_id = 0 pad_index = 0 frame_num = 4384 component_latency = 2.907959
Comp name = primary_gie
component latency= 10.752197
Comp name = tiled_display_tiler
component latency= 0.229980
Comp name = osd_conv
component latency= 0.218018
Comp name = nvosd0
component latency= 1.289795
Source id = 0 Frame_num = 4384 Frame latency = 23.814941 (ms)

As it seems is the FPS difference due to the nvv4l2decoder, primary_gie and the nvosd0 component. To be honest I don’t know what to do next to reduce these numbers.

It’s should be a limitation of your display hardware, can you try to use a simple pipeline using eglsink to see if the fps can be > 60fps