Caffe Model (and others) Output-Blob-Name Options

I am currently working on getting an SSD caffe model running in deepstream. It has been converted to a TensorRT engine for tegra based platforms.

• Hardware Platform: Jetson TX2/Xavier (Currently working on TX2)
• DeepStream Version: 5.0
• JetPack Version: 4.4
• TensorRT Version: 7.1.3
• Issue Type: Error when trying to run TensorRT engine in Deepstream on TX2 platform.

Steps Completed:
Converted Caffe SSD model into a TensorRT engine
Compiled a new updated version and replaced the old version of “”
Compiled and linked in the config file “”

Current Error:

Mismatch in the number of output buffers.Expected 2 output buffers, detected in the network :1
0:00:09.304585054 25 0x559e8cb680 ERROR nvinfer gstnvinfer.cpp:613:gst_nvinfer_logger:<primary_gie> NvDsInferContext[UID 1]: Error in NvDsInferContextImpl::fillDetectionOutput() <nvdsinfer_context_impl_output_parsing.cpp:725> [UID = 1]: Failed to parse bboxes using custom parse function

SSD Output Layer (Caffe prototxt):

layer {
name: “detection_out”
type: “DetectionOutput”
bottom: “mbox_loc”
bottom: “mbox_conf_flatten”
bottom: “mbox_priorbox”
top: “detection_out”
include {
phase: TEST
detection_output_param {
num_classes: 21
share_location: true
background_label_id: 0
nms_param {
nms_threshold: 0.45
top_k: 100
code_type: CENTER_SIZE
keep_top_k: 100
confidence_threshold: 0.25

This is all running in a customized deepstream container on the TX2 platform. We currently have no problems running Detectnet models on the platform, and have completed similar steps to run YOLO on a dGPU setup. I do not see the last layer as having a “BatchedNMS” or “NMS” output like reference in the YOLO and SSD deepstream app config files. Is there a list of available output blob names, or a way to find what the appropriate one to use is in this case?

You need to customize the post-processor for your SSD.
You can refer to below two links about how to add your own post-processor ~147

Thanks for the update. I have decided to approach this a different way, and use the “sample_ssd” program to generate a new engine file with the required prototxt changes to include a second output tensor. I am however at a loss as to how to export the engine file once created. I’ve searched through the TensorRT C++ API docs and could not find a function to export or save the created engine file to disk. Would you have any insight as to how to go about doing that?

Sorry for delay! you can refer to below code in How to Speed Up Deep Learning Inference Using TensorRT | NVIDIA Developer Blog about how to serialize() and save Host buffer into a file, and conversely, read the file and deserializeCudaEngine() to engine

ICudaEngine* getCudaEngine(string const& onnxModelPath, int batchSize)
    string enginePath{getBasename(onnxModelPath) + "_batch" + to_string(batchSize) + ".engine"};
    ICudaEngine* engine{nullptr};

    string buffer = readBuffer(enginePath);
    if (buffer.size())
        // try to deserialize engine
        unique_ptr<IRuntime, Destroy> runtime{createInferRuntime(gLogger)};
        engine = runtime->deserializeCudaEngine(, buffer.size(), nullptr);

    if (!engine)
        // Fallback to creating engine from scratch
        engine = createCudaEngine(onnxModelPath, batchSize);

        if (engine)
            unique_ptr<IHostMemory, Destroy> engine_plan{engine->serialize()};
            // try to save engine for future uses
            writeBuffer(engine_plan->data(), engine_plan->size(), enginePath);
    return engine;

No worries!
I actually solved it last week using the sample_SSD c++ example. I added

nvinfer1::IHostMemory *trtModelStream = mEngine->serialize();
std::ofstream p("SSD_engine.engine");
p.write((const char*)trtModelStream->data(),trtModelStream->size());

to the build engine function. This was added towards the end before the engine was returned and saves the engine to disk before continuing on to testing.
Thanks for the update!

1 Like