I’m running DeepStream app on RTX 2080Ti as well as on T4 and I have observed for the same configuration setting, the performance on T4 is over 3x slower. Following is the config file:
[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
[tiled-display]
enable=1
rows=1
columns=1
width=1920
height=1080
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0
[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=3
uri=file:///root/apps/video.mp4
num-sources=40
gpu-id=0
# (0): memtype_device - Memory type Device
# (1): memtype_pinned - Memory type Host Pinned
# (2): memtype_unified - Memory type Unified
cudadec-memtype=0
[sink0]
enable=0
#Type - 1=FakeSink 2=EglSink 3=File
type=2
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0
[sink1]
enable=0
type=3
#1=mp4 2=mkv
container=1
#1=h264 2=h265
codec=1
sync=0
#iframeinterval=200
bitrate=4000000
output-file=out.mp4
source-id=0
[osd]
enable=0
gpu-id=0
border-width=3
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0
[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=40
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1920
height=1080
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0
# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
gpu-id=0
labelfile-path=/root/apps/model.txt
batch-size=40
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=1
gie-unique-id=1
nvbuf-memory-type=0
config-file=model.txt
[tracker]
enable=1
tracker-width=640
tracker-height=384
gpu-id=0
ll-lib-file=/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_nvdcf.so
ll-config-file=/root/apps/tracker.yml
enable-batch-process=1
[tests]
file-loop=0
In the above screenshot, we can see RTX on the left side is about 17 FPS and on the right is T4 which is about 4-5 FPS. They are running the exact same configuration, video, and other settings. I have spent a lot of time trying to debug the cause, even if RTS is slightly powerful than T4 but I don’t think it is 3x slower. The GPU utilization on RTX is about 89-92% while that on T4 is 96-100%. Any reason why this might be happening?
• Hardware Platform: RTX 2080Ti and T4
• DeepStream Version: 5
• TensorRT Version: 7.0
• NVIDIA GPU Driver Version (valid for GPU only): 440.33.01 on both the cards