Why the fps is not crossing 35 even though free GPU space available

with a single stream - GPU usage 1158 MB/5000 MB, fps 35
with 2 streams - GPU usage 1580 MB/5000 MB, fps 16 each.

Where am I doing wrong. How to utilize maximum GPU.

• Hardware Platform (GPU)
• DeepStream Version 5.0.1
• TensorRT Version 7.0
**• NVIDIA GPU Driver Version (valid for GPU only)**440.10

The performance of the whole pipeline will be influenced by the actual usage of every element in the pipeline. The feature of the model, the usage of the model, the video sources performance, …

So it is hard for us to tell you anything without any detail in you pipeline.

Let me know what details you are expecting from me.

I have used the detectnet_v2 jupyter notebook to train a resnet18 based object detection model.
The input image size is 1920X1072.

For Deepstream, I followed the face_mask detection repo configurations.

The whole application, the video you use, the model you use, the configuration files. The platform information (HW and SW).

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
• DeepStream Version
• JetPack Version (valid for Jetson only)
• TensorRT Version
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

[quote=“Fiona.Chen, post:6, topic:160096”]
• Hardware Platform (GPU) GTX 1660
• DeepStream Version 5.0.1
• TensorRT Version7.0.0
**• NVIDIA GPU Driver Version (valid for GPU only)**440.100
Config files are listed here
source1_video_barnet_gpu.txt (2.6 KB) config_infer_primary_barnet_gpu.txt (1.6 KB)
tlt config files.
tlt_configs.zip (103.4 KB)
I can’t provide the trained model and the video in open platform.

Please do not choose “live-source=1” for testing local video files.
For performance testing, please use fakesink instead of eglsink.

I made the changes that you have suggested. The fps improved from 28 to 20.

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=1

[tiled-display]
enable=1
rows=1
columns=2
width=1280 #640
height=960 #480
gpu-id=0


[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=3
num-sources=1
uri=file:/opt/nvidia/deepstream/deepstream-5.0/samples/streams/barcode_video.mp4
gpu-id=0

[source1]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=3
num-sources=1
uri=file:/opt/nvidia/deepstream/deepstream-5.0/samples/streams/barcode_video.mp4
gpu-id=0
[streammux]
gpu-id=0
batch-size=2
batched-push-timeout=40000
## Set muxer output width and height
width=1920
height=1072
buffer-pool-size=400
#nvbuf-memory-type: 0-4
nvbuf-memory-type=1
#live-source: 1-live 0-default
#live-source=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=1
sync=0
source-id=0
gpu-id=0
container=2
codec=1
bitrate=2000000
output-file=/opt/nvidia/deepstream/deepstream-5.0/samples/streams/barcode_video_output.MP4

[sink1]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=4
sync=1
source-id=1
gpu-id=0
container=2
codec=1
bitrate=2000000
#output-file=/opt/nvidia/deepstream/deepstream-5.0/samples/streams/barcode_video_output.MP4
#rtsp-port=8554
#udp-port=5400

Since you have enabled [tiled-display], please disable [sink1].

Nothing improved

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=1

[tiled-display]
enable=1
rows=1
columns=2
width=1280 #640
height=960 #480
gpu-id=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=1
sync=1
source-id=0
gpu-id=0
container=2
codec=1
bitrate=2000000
output-file=/opt/nvidia/deepstream/deepstream-5.0/samples/streams/barcode_video_output.MP4

[sink1]
enable=0
#Type - 1=FakeSink 2=EglSink 3=File
type=1
sync=1
source-id=1
gpu-id=0
container=2
codec=1
bitrate=2000000
#output-file=/opt/nvidia/deepstream/deepstream-5.0/samples/streams/barcode_video_output.MP4
#rtsp-port=8554

What do you want to improve? the FPS? What is current GPU loading and CPU loading?

The current GPU loading is 1158MB/5400MB. I want to know how I can use all my GPU in processing a video. In other words I like to see max fps of my model using deepstream.

This is only GPU memory usage but not GPU loading.

You can get the monitoring data with “nvidia-smi dmon”

I ran with “nvidia-smi dmon”