Could not find output coverage layer for parsing objects

I’m running Deepstream-test3 in the sample apps (v1.1.8) in Docker image deepstream:6.3-triton-multiarch on a Jetson Orin Nano. I confirmed that the sample app runs with the included demo model. Now I’m trying to modify it to use my own model.

I received the error “Could not find output coverage layer for parsing objects”, which from searching suggests that the output format of my model is different from that of the demo model. Other posts on this topic were met with replies like “obviously you need to write your own output parser”, but what if I want to re-use the same output parser - what format does my model need to return? Is there any documentation on this default object detection parser?

The default postprocessing of detection model in gst-nvinfer is based on the models which output sigmod and biasadd layers. The output layer name of sigmod should have the “box” string in it. The output layer name of biasadd layer should have “cov” string in it. Then the default postprocessing will work with such model.

The source code of the default postprocessing is open source. /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/nvdsinfer_context_impl_output_parsing.cpp

Thanks for pointing me to the code. Now that I can see the code, it looks like the parsing function decodes the bounding box predictions and is separate from non-max suppression.

My model already decodes the predictions (as this can be exported to TensorRT along with the rest of the model) and I only need NMS after. Is it possible to bypass the parsing by specifying this in the config file?

There’s an example (deepstream-ssd-parser) in the sample apps but the config file format is completely different and undocumented - i.e. deepstream-ssd-parser config is a different format with different args than deepstream-test3 config. Is one of these formats deprecated or are they interchangeable? If they’re interchangeable, can you please link to documentation or give a concise example on how to convert between them? I have attempted to translate the idea in deepstream-ssd-parser to my example, but it either fails to parse the config or ignores the new entries entirely.

Yes. If you are working with gst-nvinfer, you can implement your own parsing functions by overriding the NVIDIA DeepStream SDK API Reference: nvdsinfer_custom_impl.h File Reference | NVIDIA Docs callback function. If you are working with gst-nvinferserver, you may implement the extra custome processor NVIDIA DeepStream SDK API Reference: nvdsinferserver::InferExtraProcessor Class Reference | NVIDIA Docs. The clustering algorithms will not be impacted.
There is customized postprocessing bbox parsing sample in /opt/nvidia/deepstream/deepstream/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo

Are you working with python? Will you use nvinferserver or nvinfer for inferencing?

Yes, I would like to do this entirely in python if that’s possible. Is it possible to override that callback in python?

As for nvinferserver vs nvinfer, it’s unclear to me why I would choose one over the other, but I’m currently trying to use nvinfer since I was working from Deepstream-test3 as my starting point (due to having an example with no display attached). I’m open to suggestions if one is more flexible/easier at this point for prototyping. In case it affects the choice, I’m planning on adding more models to the pipeline after I get this initial one running (i.e. passing the cropped object to another model).

The config file format inconsistencies are still throwing me off here: there’s an example for nvinferserver showing how to skip the decoding function (referenced in my previous post) and there’s plenty of examples for nvinfer showing how to specify the engine file, but it’s unclear how to go between the two formats or if they’re entirely separate systems with different args. If I could get help with either of those (or whichever of nvinfer vs nvinferserver you’d recommend) that would be greatly appreciated

Since there is no python binding for the interface, it should be implemented in c/c++.

You can choose either of nvinfer and nvinferserver. It depends on your own requirement. Please get more information from the document Gst-nvinfer — DeepStream documentation 6.4 documentation and Gst-nvinferserver — DeepStream documentation 6.4 documentation.

So it’s not possible to simply skip the parsing by specifying that in the config file? If that’s the case, what is the function that I’d have to implement in c/c++ required to do? Nothing?

I thought I was getting close by running it as an nvinferserver and specifying “other {}” under “postprocess” in the config file - I got to the point where it runs, but there’s no output in the probe function (copied from deepstream-test3) - i.e. frame_meta.frame_user_meta_list is None

No.

What is your model’s output tensor data? You said “My model already decodes the predictions (as this can be exported to TensorRT along with the rest of the model)”. The output tensor data needs to be parsed into the data structure which is needed by the NMS algorithm. Please read the sample /opt/nvidia/deepstream/deepstream/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo

The model outputs bounding boxes and confidence scores. The decoding that I included in the tensorrt export is very similar to the decode function in /opt/nvidia/deepstream/deepstream/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/NvDsInferParseCustomYolov3_cuda.cu

Can you please be more specific about which aspect of /opt/nvidia/deepstream/deepstream/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo that I need to implement? i.e. what is this “data structure which is needed by the NMS algorithm”? Is there documentation anywhere?

Take the YoloV3 implementation in /opt/nvidia/deepstream/deepstream/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo as the example, “NvDsInferParseCustomYoloV3_cuda()” function override the “NvDsInferParseCustomFunc()” interface NVIDIA DeepStream SDK API Reference: nvdsinfer_custom_impl.h File Reference | NVIDIA Docs to provide the data structure " std::vector< NvDsInferObjectDetectionInfo > &objectList" which is needed by the NMS algorithm. Please read the API document and the sample source code for details.

@jason.taylor - Please advice if latest info shared unblocked you.

Thanks

@carlosgs not really. I’m not going to learn to implement custom cuda kernels for something that should be trivial like running a typical object detection architecture.

This post in the forum indicates that it can be done in python: https://forums.developer.nvidia.com/t/skip-postprocessing-when-using-nvinfer/203531 but the user never provided their solution

I’ve been trying to extrapolate from deepstream-ssd-parser in the demo apps. That example specifies custom_lib { path: "/opt/nvidia/deepstream/deepstream/lib/libnvds_infercustomparser.so" } in the config, but I don’t see any references to that in the python code and it looks like it parses the network output entirely in python.

This is what I have so far in my probe function, which successfully converts the deepstream model output to numpy arrays that I can then pass to NMS:

    gst_buffer = info.get_buffer()
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break
        
        l_user = frame_meta.frame_user_meta_list
        user_meta = pyds.NvDsUserMeta.cast(l_user.data)
        tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data)
        layers_info = []
        for i in range(tensor_meta.num_output_layers):
            layer = pyds.get_nvds_LayerInfo(tensor_meta, i)
            layers_info.append(layer)
        cls_scores = next(layer for layer in layers_info if layer.layerName == 'cls_scores')
        bbox_preds = next(layer for layer in layers_info if layer.layerName == 'bbox_preds')

        ptr = ctypes.cast(pyds.get_ptr(bbox_preds.buffer), ctypes.POINTER(ctypes.c_float))
        bbox_preds = np.ctypeslib.as_array(ptr, shape=CAR_BBOX_PREDS_SHAPE)
        ptr = ctypes.cast(pyds.get_ptr(cls_scores.buffer), ctypes.POINTER(ctypes.c_float))
        cls_scores = np.ctypeslib.as_array(ptr, shape=CAR_SCORE_SHAPE)

I then create a deepstream detector output object for each detection with pyds.NvDsInferObjectDetectionInfo() and set the relevant properties (detectionConfidence, classId, left, top, width, height) and run the non-max suppression as in the ssd example.

Where I’m stuck now is how to use this detection output from the probe function to take a crop of the original frame and pass that to my next model in the pipeline. Is this possible?

It is different to your use case. You need to keep the nvinfer internal NMS clustering. The sample in that topic is for “output-tensor-meta” enabled which skips all postprocessing inside nvinfer(including clustering algorithms).

You can use the SGIE configuration for your second model. The nvinfer will crop objects and send to the model internally, you don’t need to do anything. There is PGIE+SGIE sample in deepstream_python_apps/apps/deepstream-test2 at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub.

I’ll simply use another NMS implementation in that case, just like in the SSD example in the demo apps.

How does it get the crop from the probe function? What do I have to set for that cropping to automatically happen?

To enable SGIE mode, just as this sample deepstream_python_apps/apps/deepstream-test2/dstest2_sgie1_config.txt at master · NVIDIA-AI-IOT/deepstream_python_apps (github.com), the VehicleMake model works in SGIE mode. “process-mode=2” is set in the configuration file to enable “secondary GIE” mode. The other preprocess and postprocess parameters should be configured as the innstruction DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums.

The deepstream_python_apps/apps/deepstream-test2 at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub is the sample for PGIE+SGIE use case.

My case differs from those examples in that I’m doing the postprocessing of my first model in the probe function, as explained above.

How does the second model get the crop from the probe function? What do I have to set for that cropping to automatically happen?

You mentioned you refer the deepstream_python_apps/apps/deepstream-ssd-parser/deepstream_ssd_parser.py at master · NVIDIA-AI-IOT/deepstream_python_apps (github.com) to customize your postprocessing of your PGIE. The “add_obj_meta_to_frame” in deepstream_python_apps/apps/deepstream-ssd-parser/deepstream_ssd_parser.py at master · NVIDIA-AI-IOT/deepstream_python_apps (github.com) adds the bboxes’ coordinates to the object meta. The object meta will be passed to the downstream elements such as SGIEs and nvmultistreamtiler,etc. The SGIE will get the objects coordinates from the objects meta and do the cropping and processing by itself automatically.

Please refer to MetaData in the DeepStream SDK — DeepStream documentation 6.4 documentation for the metadata of DeepStream SDK.

Thanks. I got the second model running on crops from the first model. It requires a combination of add_obj_meta_to_frame as you mentioned, (obviously) connecting them in the pipeline, and also setting the operate_on_gie_id property under input_control in the config file for the second model.

Now I need to pass the crop from the second model to the third/final model. I assume I need to rescale the bounding boxes of the second model to the original video size, just as I did for the first model. This was trivial for the first model because I know the video size, but the second model is running on a resized crop from the first model - how do I get that original crop size in the probe function for my second model?

Can you describe the relationship between the models? E.G. As in NVIDIA-AI-IOT/deepstream_lpr_app: Sample app code for LPR deployment on DeepStream (github.com), the PGIE is car detection model, the first SGIE is a plate detector which detect car plate from car objects, the second SGIE is a plate text recognition model which identify the texts in car plates.

Seems you have some misunderstanding for the mechanism of SGIE. You can find “gie-unique-id” parameter in every nvinfer configuration file. deepstream_python_apps/apps/deepstream-test2/dstest2_pgie_config.txt at master · NVIDIA-AI-IOT/deepstream_python_apps (github.com), with this parameter, every GIE has its own ID. In every SGIE configuration file, there is “operate-on-gie-id” parameter deepstream_python_apps/apps/deepstream-test2/dstest2_sgie1_config.txt at master · NVIDIA-AI-IOT/deepstream_python_apps (github.com), which helps SGIE to identify where the bboxes come from. So you don’t have to do anything extra but to configure the paramters correctly.