Not able to use nvdsmetamux correctly

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Tesla T4
• DeepStream Version 7.1
• JetPack Version (valid for Jetson only)
• TensorRT Version 10.3.0.26-1+cuda12.5
• NVIDIA GPU Driver Version (valid for GPU only) 566.36 (NVCC Version 12.6)
• Issue Type( questions, new requirements, bugs)
Getting stream error when running the following pipeline
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
nvstreammux name=nvstreammux0 batch-size=16 width=1920 height=1080 live-source=False sync-inputs=True batched-push-timeout=40000
! queue
! tee name=t

t. ! nvinfer batch-size=16 config-file-path=“/code/processor/video/DeepStream-Yolo-Face/config_infer_primary_yoloV8_face.txt” model-engine-file=“/code/processor/video/DeepStream-Yolo-Face/yolov8n-face-2-arsh.onnx_b16_gpu0_fp32.engine”
! queue ! meta.sink_0

t. ! nvinfer batch-size=16 config-file-path=“/code/processor/video/DeepStream-Yolo-Face/config_infer_primary_yoloV8_face.txt” model-engine-file=“/code/processor/video/DeepStream-Yolo-Face/yolov8n-face-2-arsh.onnx_b16_gpu0_fp32.engine”
! queue ! meta.sink_1

uridecodebin3 uri=“file:///videos/classroom.mp4” name=decodebin0 ! queue ! nvstreammux0.sink_0
uridecodebin3 uri=“file:///videos/fruit-and-vegetable-detection.mp4” name=decodebin1 ! queue ! nvstreammux0.sink_1

nvdsmetamux config-file=/code/config_files/metamux/config_metamux0.txt name=meta
! queue
! nvdsosd process-mode=1
! queue
! queue ! nvvideoconvert ! “video/x-raw(memory:NVMM),format=NV12” ! nvv4l2h264enc ! h264parse ! rtspclientsink location=rtsp://gstreamer-rtsp:8554/stream/0
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

What is your goal? Metamux is used for parallel inference and merging metadata.

If you want build a parallel inference pipeline, please refer to this repository

or the command line.

gst-launch-1.0 uridecodebin uri=file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4 ! mux.sink_0 \
               uridecodebin uri=file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4 ! mux.sink_1 \
               nvstreammux name=mux gpu-id=0 batch-size=2 width=1920 height=1080 ! queue ! tee name=t \
               t.src_0 ! queue ! meta.sink_0 \
               t.src_1 ! queue ! nvstreamdemux name=demux per-stream-eos=true  \
               demux.src_0 ! queue ! tee name=b0_t \
               demux.src_1 ! queue ! tee name=b1_t \
               b0_t.src_0 ! queue ! b0_m.sink_0 nvstreammux name=b0_m batch-size=1 width=1920 height=1080 ! queue ! nvinfer batch-size=1 config-file-path=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_infer_primary.txt ! queue ! meta.sink_1 \
               b1_t.src_0 ! queue ! b1_m.sink_1 nvstreammux name=b1_m batch-size=1 width=1920 height=1080 ! queue ! nvinfer batch-size=1 config-file-path=/opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app/config_infer_primary.txt ! queue ! meta.sink_2 \
               nvdsmetamux name=meta config-file=config_metamux.txt ! queue ! nvmultistreamtiler width=1920 height=1080 ! queue ! nvvideoconvert ! nvdsosd ! nv3dsink

We are using this, following your reply

gst-launch-1.0 uridecodebin uri=file:///videos/classroom.mp4 ! mux.sink_0 
    uridecodebin uri=file:///videos/classroom.mp4 ! mux.sink_1 
    nvstreammux name=mux gpu-id=0 batch-size=2 width=1920 height=1080 ! queue ! tee name=t 
    t.src_0 ! queue ! meta.sink_0 
    t.src_1 ! queue ! nvstreamdemux name=demux per-stream-eos=true  
    demux.src_0 ! queue ! tee name=b0_t 
    demux.src_1 ! queue ! tee name=b1_t 
    b0_t.src_0 ! queue ! b0_m.sink_0 nvstreammux name=b0_m batch-size=1 width=1920 height=1080 ! queue ! nvinfer batch-size=1 config-file-path=/code/processor/video/DeepStream-Yolo-Face/config_infer_primary_yoloV8_face.txt ! queue ! meta.sink_1 
    b1_t.src_0 ! queue ! b1_m.sink_1 nvstreammux name=b1_m batch-size=1 width=1920 height=1080 ! queue ! nvinfer batch-size=1 config-file-path=/code/processor/video/DeepStream-Yolo-Face/config_infer_primary_yoloV8_face.txt ! queue ! meta.sink_2 
    nvdsmetamux name=meta config-file=/code/config_files/metamux/config_metamux0.txt ! queue ! nvmultistreamtiler width=1920 height=1080 ! queue ! nvvideoconvert ! nvdsosd ! 
    queue ! nvvideoconvert ! "video/x-raw(memory:NVMM),format=NV12" ! nvv4l2h264enc  ! h264parse 
    ! rtspclientsink location=rtsp://gstreamer-rtsp:8554/stream0

Getting error from cuda

Setting pipeline to PAUSED ...
INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:327 [FullDims Engine Info]: layers num: 4
0   INPUT  kFLOAT input           3x640x640       min: 1x3x640x640     opt: 16x3x640x640    Max: 16x3x640x640    
1   OUTPUT kFLOAT boxes           8400x4          min: 0               opt: 0               Max: 0               
2   OUTPUT kFLOAT scores          8400x1          min: 0               opt: 0               Max: 0               
3   OUTPUT kFLOAT landmarks       8400x15         min: 0               opt: 0               Max: 0               

INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:327 [FullDims Engine Info]: layers num: 4
0   INPUT  kFLOAT input           3x640x640       min: 1x3x640x640     opt: 16x3x640x640    Max: 16x3x640x640    
1   OUTPUT kFLOAT boxes           8400x4          min: 0               opt: 0               Max: 0               
2   OUTPUT kFLOAT scores          8400x1          min: 0               opt: 0               Max: 0               
3   OUTPUT kFLOAT landmarks       8400x15         min: 0               opt: 0               Max: 0               

Pipeline is PREROLLING ...
Progress: (open) Opening Stream
Progress: (connect) Connecting to rtsp://gstreamer-rtsp:8554/stream0
Progress: (open) Retrieving server options
Progress: (open) Opened Stream
INFO:
gstnvtiler.cpp(253): gst_nvmultistreamtiler_sink_event (): /GstPipeline:pipeline0/GstNvMultiStreamTiler:nvmultistreamtiler0:
Configuration 1x2

Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
Progress: (request) Sending RECORD request
Warning: Not increasing the pool size as CUDA memory is already at 99.000000 utilisation
Warning: Not increasing the pool size as CUDA memory is already at 99.000000 utilisation
Redistribute latency...
Cuda failure: status=2
Error(-1) in buffer allocation
Execution ended after 0:00:00.255649589
Setting pipeline to NULL ...
Cuda failure: status=2
Error(-1) in buffer allocation
Cuda failure: status=2
Error(-1) in buffer allocation
Cuda failure: status=2
Error(-1) in buffer allocation
Freeing pipeline ...
cudaErrorMemoryAllocation = 2
The API call failed because it was unable to allocate enough memory or other resources to perform the requested operation.

T4 only has 16G of video memory. Parallel inference of two instances of yolo-face model leads to insufficient video memory. You need to change your GPU to one with larger memory

Is there a better way to do parallel inferencing then? As we have seen that for a single model it doesn’t take more than 1gb of vRAM, so it’s a bit confusing why it’s taking more than 16gb when using two instances of the model.

Using the vehicle detection model in DS SDK to do parallel inferencing, only 1G plus vRAM is used.

Will this model use a lot of vRAM ? Is there another program using the GPU? Use nvidia-smi -l to monitor the vRAM usage.