How to using deepstream on multi-GPU

  • ubuntu_server18.04.2 LTS
  • nvcr.io/nvidia/deepstream:5.1-21.02-devel
  • 2080TI*3
  • deepstream-app version 5.1.0
  • DeepStreamSDK 5.1.0
  • CUDA Driver Version: 11.2
  • CUDA Runtime Version: 11.1
  • TensorRT Version: 7.2
  • cuDNN Version: 8.0
  • libNVWarp360 Version: 2.0.1d3

Hi,eveyone.
I am using yolov4_deepstream,but 400FPS(h264) is already the nvdec limit, So I want to use multiple GPUs to decode. But I get an error.
following is my config.txt

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl

[tiled-display]
enable=0
rows=2
columns=4
width=1280
height=720
gpu-id=0
#(0): nvbuf-mem-default - Default memory allocated, specific to particular platform
#(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla
#(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla
#(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla
#(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson
nvbuf-memory-type=0

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=3
uri=file:/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4
#uri=file:/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_720p.h264
num-sources=4
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0


[source1]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=3
uri=file:/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4
#uri=file:/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_720p.h264
num-sources=4
gpu-id=1
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File
type=1
sync=0
source-id=0
gpu-id=0
nvbuf-memory-type=0
#1=mp4 2=mkv
container=1
#1=h264 2=h265
codec=1
output-file=yolov4.mp4



[osd]
enable=1
gpu-id=0
border-width=1
text-size=12
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[streammux]
gpu-id=0
##Boolean property to inform muxer that sources are live
live-source=0
batch-size=8
##time out in usec, to wait after the first buffer is available
##to push the batch even if the complete batch is not formed
batched-push-timeout=40000
## Set muxer output width and height
width=1280
height=720
##Enable to maintain aspect ratio wrt source, and allow black borders, works
##along with width, height properties
enable-padding=0
nvbuf-memory-type=0

# config-file property is mandatory for any gie section.
# Other properties are optional and if set will override the properties set in
# the infer config file.
[primary-gie]
enable=1
gpu-id=0
model-engine-file=yolov4-uniform-dynamic-max16.engine
labelfile-path=labels.txt
batch-size=8

#Required by the app for OSD, not a plugin property
bbox-border-color0=1;0;0;1
bbox-border-color1=0;1;1;1
bbox-border-color2=0;0;1;1
bbox-border-color3=0;1;0;1
interval=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV4.txt

[tracker]
enable=0
tracker-width=512
tracker-height=320
ll-lib-file=/opt/nvidia/deepstream/deepstream-5.0/lib/libnvds_mot_klt.so

[tests]
file-loop=1

get log

Unknown or legacy key specified ‘is-classifier’ for group [property]
X11 connection rejected because of wrong authentication.
0:00:02.308254301 1322 0x7fb114002240 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1702> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-5.1/sources/yolov4_deepstream/deepstream_yolov4/yolov4-uniform-dynamic-max16.engine
INFO: …/nvdsinfer/nvdsinfer_model_builder.cpp:685 [FullDims Engine Info]: layers num: 3
0 INPUT kFLOAT input 3x416x416 min: 1x3x416x416 opt: 16x3x416x416 Max: 16x3x416x416
1 OUTPUT kFLOAT boxes 10647x1x4 min: 0 opt: 0 Max: 0
2 OUTPUT kFLOAT confs 10647x80 min: 0 opt: 0 Max: 0

0:00:02.308352745 1322 0x7fb114002240 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1806> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-5.1/sources/yolov4_deepstream/deepstream_yolov4/yolov4-uniform-dynamic-max16.engine
0:00:02.347222330 1322 0x7fb114002240 INFO nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus:<primary_gie> [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-5.1/sources/yolov4_deepstream/deepstream_yolov4/config_infer_primary_yoloV4.txt sucessfully

Runtime commands:
h: Print this help
q: Quit

    p: Pause
    r: Resume

** INFO: <bus_callback:181>: Pipeline ready

ERROR from src_bin_muxer: Memory Compatibility Error:Input surface gpu-id doesnt match with configured gpu-id for element, please allocate input using unified memory, or use same gpu-ids OR, if same gpu-ids are used ensure appropriate Cuda memories are used
Debug info: gstnvstreammux.c(263): blit_buffer (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstNvStreamMux:src_bin_muxer:
surface-gpu-id=1,src_bin_muxer-gpu-id=0
ERROR from qtdemux0: Internal data stream error.
Debug info: qtdemux.c(6073): gst_qtdemux_loop (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin7/GstURIDecodeBin:src_elem/GstDecodeBin:decodebin7/GstQTDemux:qtdemux0:
streaming stopped, reason error (-5)
ERROR from src_bin_muxer: Memory Compatibility Error:Input surface gpu-id doesnt match with configured gpu-id for element, please allocate input using unified memory, or use same gpu-ids OR, if same gpu-ids are used ensure appropriate Cuda memories are used
Debug info: gstnvstreammux.c(263): blit_buffer (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstNvStreamMux:src_bin_muxer:
surface-gpu-id=1,src_bin_muxer-gpu-id=0
ERROR from qtdemux1: Internal data stream error.
Debug info: qtdemux.c(6073): gst_qtdemux_loop (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin1/GstURIDecodeBin:src_elem/GstDecodeBin:decodebin1/GstQTDemux:qtdemux1:
streaming stopped, reason error (-5)
ERROR from src_bin_muxer: Memory Compatibility Error:Input surface gpu-id doesnt match with configured gpu-id for element, please allocate input using unified memory, or use same gpu-ids OR, if same gpu-ids are used ensure appropriate Cuda memories are used
Debug info: gstnvstreammux.c(263): blit_buffer (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstNvStreamMux:src_bin_muxer:
surface-gpu-id=1,src_bin_muxer-gpu-id=0
ERROR from src_bin_muxer: Memory Compatibility Error:Input surface gpu-id doesnt match with configured gpu-id for element, please allocate input using unified memory, or use same gpu-ids OR, if same gpu-ids are used ensure appropriate Cuda memories are used
Debug info: gstnvstreammux.c(263): blit_buffer (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstNvStreamMux:src_bin_muxer:
surface-gpu-id=1,src_bin_muxer-gpu-id=0
ERROR from qtdemux7: Internal data stream error.
Debug info: qtdemux.c(6073): gst_qtdemux_loop (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin5/GstURIDecodeBin:src_elem/GstDecodeBin:decodebin5/GstQTDemux:qtdemux7:
streaming stopped, reason error (-5)
ERROR from qtdemux6: Internal data stream error.
Debug info: qtdemux.c(6073): gst_qtdemux_loop (): /GstPipeline:pipeline/GstBin:multi_src_bin/GstBin:src_sub_bin6/GstURIDecodeBin:src_elem/GstDecodeBin:decodebin6/GstQTDemux:qtdemux6:
streaming stopped, reason error (-5)
Quitting
App run failed

I am using nvdec of GPU01 and GPU02 to decode.

image
How do I set it,please help me

If the pipeline can’t cross the different GPU.How to run multi pipeline on Deepstream at the same time.

I remember there was an issue to use the decoder cross GPU in DeepStream , will find and get back to you.

But, if you do as your diagram, don’t you waste the GPU M/L resource of the GPU02?
And, you may could use CPU decoding, AFAIK, some Intel CPU has powerful SW decoding capability.

1 Like