Proper usage of nvdsmetamux in sequential pipeline with multiple PGIEs

Please provide complete information as applicable to your setup.

• Hardware Platform: Jetson Orin NX 16 GB
• DeepStream Version: 7.0
• JetPack Version: 6.0
• TensorRT Version: 8.6.2.3
• Issue Type: question

Hi everyone,

I’m experimenting with nvdsmetamux and would like to confirm if I’m using it correctly.

My baseline pipeline is:

filesrc -> h265parse -> nvv4l2decoder -> nvstreammux -> pgie1 -> pgie2 -> fakesink
  • pgie1 inference time: ~60 ms
  • pgie2 inference time: ~100 ms

If I attach a probe on the fakesink sink pad, I only see results from pgie1 (unique-id=1).
But I need both PGIE results together, since the models depend on each other.

When I change the pipeline to:

filesrc -> h265parse -> nvv4l2decoder -> nvstreammux -> pgie1 -> pgie2 -> nvdsmetamux -> fakesink

…I can see results from both pgie1 and pgie2 in the probe.

My questions:

  1. Is this a valid/correct usage of nvdsmetamux? Most examples I’ve seen use it only in parallel pipelines with multiple branches feeding different sink pads.
  2. Can nvdsmetamux be safely used as a synchronization mechanism for faster vs. slower models in the same branch?

Note: I tested the parallel pipeline approach and it works, but the performance is the same as sequential, and the structure becomes unnecessarily more complex.

Thanks in advance!

Can you explain what your two models detects? Can you upload your nvinfer configuration files for the two models? If you remove pgie1, can the objects detected by pgie2 be available in fakesink sink pad probe? How did you read out the objects from pgie1 and pgie2 in the probe function?

No, it’s incorrect.

No. 'nvdsmetamux" is not a synchronization mechanism.

Your pipeline does not need parallel pipeline or nvdsmetamux.

Please debug with your first pipeline to find out the root cause of the missing pgie2 objects.

1 Like

Hi Fiona,

I have no idea what the issue was, but it seems to have resolved itself, thanks!

I’m sharing one of the pgie configuration files anyway, since I noticed another unexpected behavior. Specifically, when I access the numpy segmentation masks:

masks = pyds.get_segmentation_masks(segmeta)

masks = np.array(masks, copy=True, order=‘C’)

I found that some pixel values are set to -1 (possible classes are only 0 for background and 1 for objects of interest). While reviewing the configuration file, I came across the following parameter and its comment:

# Confidence threshold for the segmentation model to output a valid class for a pixel.

# If confidence is less than this threshold, class output for that pixel is -1

segmentation-threshold=0.0

However, since the threshold is set to 0.0, in theory I shouldn’t be getting any -1 outputs.

Do you know how I can resolve this?

seg0_pgie_config.txt (4.9 KB)

Yes. The segmentation mask generation algorithm is open source, please refer to SegmentPostprocessor::fillSegmentationOutput() in /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/nvdsinfer_context_impl_output_parsing.cpp.

You may need to debug with your app and model to find out why the “-1” is generated.

1 Like

Ok, thanks.

Could you suggest the best way to debug this? My pipeline is currently implemented in Python. Should I re-implement it in C++ and set a breakpoint in SegmentPostprocessor::fillSegmentationOutput(), or are there alternative approaches you would recommend?

Please read the code of SegmentPostprocessor::fillSegmentationOutput() in /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/nvdsinfer_context_impl_output_parsing.cpp, the only case to output the “-1” mask value is that the probilities of the class are all “-1”. Seems the model output is not correct.

You may need to dump the original model output to check.

You may modify the InferPostprocessor::postProcessHost() function in /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/nvdsinfer_context_impl.cpp as the following code, rebuild and replace the libnvds_infer.so library and set “dump-output-tensor=1” in the nvinfer configuration file. Then you will get the dumped output tensor data.

NvDsInferStatus
InferPostprocessor::postProcessHost(NvDsInferBatch& batch,
        NvDsInferContextBatchOutput& batchOutput)
{
    batchOutput.frames = new NvDsInferFrameOutput[batch.m_BatchSize];
    batchOutput.numFrames = batch.m_BatchSize;

    /* For each frame in the current batch, parse the output and add the frame
     * output to the batch output. The number of frames output in one batch
     * will be equal to the number of frames present in the batch during queuing
     * at the input.
     */
    for (unsigned int index = 0; index < batch.m_BatchSize; index++)
    {
        NvDsInferFrameOutput& frameOutput = batchOutput.frames[index];
        frameOutput.outputType = NvDsInferNetworkType_Other;

        /* Calculate the pointer to the output for each frame in the batch for
         * each output layer buffer. The NvDsInferLayerInfo vector for output
         * layers is passed to the output parsing function. */
        for (unsigned int i = 0; i < m_OutputLayerInfo.size(); i++)
        {
            NvDsInferLayerInfo& info = m_OutputLayerInfo[i];
            if(needOutputCopyB4Processing()){
                info.buffer =
                    (void*)(batch.m_HostBuffers[info.bindingIndex]->ptr<uint8_t>() +
                            info.inferDims.numElements *
                                getElementSize(info.dataType) * index);
            }
            else{
                info.buffer =
                    (void*)((uint8_t *)(batch.m_DeviceBuffers[info.bindingIndex]) +
                            info.inferDims.numElements *
                                getElementSize(info.dataType) * index);
            }
            if (m_DumpOpTensor) {
                uint32_t dump_size = info.inferDims.numElements * getElementSize(info.dataType);
                std::string file_path, layer_name;
                std::cout << "info.inferDims.numElements " << info.inferDims.numElements << " info.inferDims.numDims " << info.inferDims.numDims << " info.dataType " << info.dataType << std::endl;

                layer_name = info.layerName;
                for (auto & element : m_DumpOpTensorFiles)
                {
                    if (layer_name == element.first) {
                        file_path = element.second;
                        break;
                    }
                }
                if (file_path.empty()) {
                    std::pair<std::string, std::string> file_pair;
                    file_path = layer_name + "_op_tensor.bin";
                    std::replace(file_path.begin(), file_path.end(), '/', '-');
                    file_pair = std::make_pair(layer_name, file_path);
                    m_DumpOpTensorFiles.push_back(file_pair);
                }
                std::ofstream dump_op_file(file_path, std::ios_base::app);
                dump_op_file.write((char*) info.buffer, dump_size);
                dump_op_file.close();
            }
            if (m_OverwriteOpTensor) {
                uint32_t dump_size = m_NetworkInfo.width*m_NetworkInfo.height*m_NetworkInfo.channels;
                std::string layer_name = info.layerName;
                for (auto & element:m_OverwriteOpTensorFilePairs)
                {
                    std::replace(element.first.begin(), element.first.end(), '-', '/');
                    std::string sub_file_name = element.first.substr(0, layer_name.size());
                    if (layer_name == sub_file_name) {
                        int index = element.second;
                        m_OverwriteOpTensorFiles[index]->read((char *) info.buffer, dump_size*4);
                        break;
                    }
                }
            }
        }

        RETURN_NVINFER_ERROR(parseEachBatch(m_OutputLayerInfo, frameOutput),
            "Infer context initialize inference info failed");
    }

    /* Fill the host buffers information in the output. */
    batchOutput.numHostBuffers = m_AllLayerInfo.size();
    batchOutput.hostBuffers = new void*[m_AllLayerInfo.size()];
    for (size_t i = 0; i < batchOutput.numHostBuffers; i++)
    {
        batchOutput.hostBuffers[i] =
            batch.m_HostBuffers[i] ? batch.m_HostBuffers[i]->ptr() : nullptr;
    }

    batchOutput.numOutputDeviceBuffers = m_OutputLayerInfo.size();
    batchOutput.outputDeviceBuffers = new void*[m_OutputLayerInfo.size()];
    for (size_t i = 0; i < batchOutput.numOutputDeviceBuffers; i++)
    {
        batchOutput.outputDeviceBuffers[i] =
            batch.m_DeviceBuffers[m_OutputLayerInfo[i].bindingIndex];
    }

    /* Mark the set of host buffers as not with the context. */
    batch.m_BuffersWithContext = false;

    return NVDSINFER_SUCCESS;
}

Hi, thanks for the suggestion! I will try that method as well. Meanwhile, I tried two alternative debugging approaches:

  1. I added a Python binding to access class_probabilities_map.
m.def("get_segmentation_class_probabilities_map",
      [](void *data) {
          auto *META = (NvDsInferSegmentationMeta *) data;
          int channel = META->classes; 
          int width = META->width;
          int height = META->height;
          auto dtype = py::dtype(py::format_descriptor<float>::format());
          return py::array(dtype, {channel, height, width},
                           {height * width * sizeof(float), width * sizeof(float), sizeof(float)}, // strides
                           META->class_probabilities_map);
      },
      "data"_a,
      pydsdoc::methodsDoc::get_segmentation_class_probabilities_map);
  1. I added debug prints inside nvdsinfer_context_impl_output_parsing.cpp to check the per-class probabilities.

Both approaches show that my network outputs raw logits in a range like [-4.0488, 4.4727], similar to what I get in PyTorch [-4.1382, 4.4711]. These are raw logits; I usually apply Softmax in PyTorch to get probabilities before Argmax.

Questions:

  • Can you confirm that NvInfer does not apply any automatic post-processing like Softmax? That is, SegmentPostprocessor::fillSegmentationOutput() receives raw logits?
  • To avoid -1 masks, I added a Softmax layer inside the PyTorch model, then exported it to ONNX and built the engine using trtexec. This works. Is this the correct approach?
  • If I don’t want to modify the PyTorch model, is there a DeepStream plugin that can apply Softmax for me? I saw nvdspostprocess, but the docs say it currently supports only detection and classification models (I’m using DeepStream 7.0).

Thanks for your help!

No, the default postprocessing does not include softmax algorithm. Please refer to the code SegmentPostprocessor::fillSegmentationOutput() in /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/nvdsinfer_context_impl_output_parsing.cpp

This is one of the solutions.

No. If you want to use nvdspostprocess, please customize your own postprocessing algorithm, nvdspostprocess is a template plugin, the /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvdspostprocess/postprocesslib_impl provides some samples, you can customize your own /opt/nvidia/deepstream/deepstream/sources/gst-plugins/gst-nvdspostprocess/postprocesslib_impl/post_processor_segmentation.cpp

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.