How to run Nvidia's example torch SSD net on Deepstream-App with objectDetector_SSD's custom plugin

Please provide complete information as applicable to your setup.

**• Hardware Platform (Jetson / GPU)**nvidia GPU
Net trained on Jetson Xavier NX, deepstream-app running on
• DeepStream Version
5.0.1
• JetPack Version (valid for Jetson only)
• TensorRT Version
7.0.0
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
follow the steps I describe
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hello everyone!
I’m trying to run custom models on deepstream app. My current goal is to do it with torch models.
This is what I have done so fr.

  • Firstly I followed this guide to launch the dustynv/jetson-inference container on a Jetson-Xavier NX.
  • Then I followed this guide to train a SSD model on the container, then I generated the ONNX file (I used all 6500 images).
  • Then I moved the ONNX file to a server running a container made from nvcr.io/nvidia/deepstream:5.0.1-20.09-devel image, where I made a copy of objectDetector_SSD example.
  • Next step was to modify the config files and run the app. The fp16.enfine file was generated successfully, but then, on runtime, I got segmentation fault.
  • Later , based on this forum answer, I modified nvdsinfer_custom_impl_ssd/nvdsparsebbox_ssd.cpp, I changed ‘NMS’ and ‘NMS1’ with ‘scores’ and ‘boxes’ respectively.

I can run the app, but bounding-boxes are all wrong (very small, on a left-top corner) and often the app breaks by a segmentation-fault I haven’t been able to figure.
to avoid this segmentation faults, I limited the for on the line 96 of the nvdsinfer_custom_impl_ssd/nvdsparsebbox_ssd.cpp file, but it appears again if I attempt to increase the net-scale factor.

So, how can I run the torch net I trained on Deepstream-App?
Than you.

This is my app’s config file.
################################################################################
# Copyright (c) 2018-2020, NVIDIA CORPORATION. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the “Software”),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
################################################################################

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=1
gie-kitti-output-dir=streamscl

[tiled-display]
enable=0
rows=1
columns=1
width=1280
height=720
gpu-id=0
nvbuf-memory-type=0

[source0]
enable=0
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=3
num-sources=1
uri=file:/home/<user>/dev/nvidia/samples/streams/sample_720p.mp4
gpu-id=0
cudadec-memtype=0

[source1]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI 4=RTSP
type=4
uri=rtsp://<rtsp stream from local camera>
#drop-frame-interval=2
gpu-id=0
# (0): memtype_device   - Memory type Device
# (1): memtype_pinned   - Memory type Host Pinned
# (2): memtype_unified  - Memory type Unified
cudadec-memtype=0

[streammux]
gpu-id=0
batch-size=1
batched-push-timeout=-1
## Set muxer output width and height
width=1920
height=1080
nvbuf-memory-type=0

[sink0]
enable=0
#Type - 1=FakeSink 2=EglSink 3=File
type=1
sync=1
source-id=0
gpu-id=0

[sink1]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming
type=4
#1=h264 2=h265
codec=1
#encoder type 0=Hardware 1=Software
enc-type=0
sync=1
bitrate=3000000
#H264 Profile - 0=Baseline 2=Main 4=High
#H265 Profile - 0=Main 1=Main10
profile=0
# set below properties in case of RTSPStreaming  
rtsp-port=8555
udp-port=5400

[sink2]
enable=0
#Type - 1=FakeSink 2=EglSink 3=File
type=3
sync=1
source-id=0
gpu-id=0
qos=0
nvbuf-memory-type=0
overlay-id=1
container=1
codec=1
output-file=­­output.mp4

[osd]
enable=1
gpu-id=0
border-width=3
text-size=15
text-color=1;1;1;1;
text-bg-color=0.3;0.3;0.3;1
font=Serif
show-clock=0
clock-x-offset=800
clock-y-offset=820
clock-text-size=12
clock-color=1;0;0;0
nvbuf-memory-type=0

[primary-gie]
enable=1
gpu-id=0
batch-size=1
gie-unique-id=1
interval=0
labelfile-path=/home/<user>/dev/nvidia/proyectos/ejemplos/ssd_fruit/labels.txt
model-engine-file=/home/<user>/dev/nvidia/proyectos/ejemplos/ssd_fruit/ssd-mobilenet.onnx_b1_gpu0_fp16.engine
config-file=config_infer_primary_ssd.txt
nvbuf-memory-type=0

[tests]
file-loop=0

and this is the inference config-file

################################################################################
# Copyright (c) 2018-2020, NVIDIA CORPORATION. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
################################################################################

# Following properties are mandatory when engine files are not specified:
#   int8-calib-file(Only in INT8), model-file-format
#   Caffemodel mandatory properties: model-file, proto-file, output-blob-names
#   UFF: uff-file, input-dims, uff-input-blob-name, output-blob-names
#   ONNX: onnx-file
#
# Mandatory properties for detectors:
#   num-detected-classes,
#   custom-lib-path,
#   parse-bbox-func-name
#
# Optional properties for detectors:
#   cluster-mode(Default=Group Rectangles), interval(Primary mode only, Default=0)
#
# Mandatory properties for classifiers:
#   classifier-threshold, is-classifier
#
# Optional properties for classifiers:
#   classifier-async-mode(Secondary mode only, Default=false)
#
# Optional properties in secondary mode:
#   operate-on-gie-id(Default=0), operate-on-class-ids(Defaults to all classes),
#   input-object-min-width, input-object-min-height, input-object-max-width,
#   input-object-max-height
#
# Following properties are always recommended:
#   batch-size(Default=1)
#
# Other optional properties:
#   net-scale-factor(Default=1), network-mode(Default=0 i.e FP32),
#   model-color-format(Default=0 i.e. RGB) model-engine-file, labelfile-path,
#   mean-file, gie-unique-id(Default=0), offsets, process-mode (Default=1 i.e. primary),
#   custom-lib-path, network-mode(Default=0 i.e FP32)
#
# The values in the config file are overridden by values set through GObject
# properties.

[property]
gpu-id=0
net-scale-factor=0.0078431372
offsets=127.5;127.5;127.5
model-color-format=0
model-engine-file=/home/<user>/dev/nvidia/proyectos/ejemplos/ssd_fruit/ssd-mobilenet.onnx_b1_gpu0_fp16.engine
labelfile-path=/home/<user>/dev/nvidia/proyectos/ejemplos/ssd_fruit/labels.txt
#uff-file=sample_ssd_relu6.uff
infer-dims=3;300;300
#uff-input-order=0
#uff-input-blob-name=Input
batch-size=1
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=9
interval=0
gie-unique-id=1
is-classifier=0
#output-blob-names=MarkOutput_0
parse-bbox-func-name=NvDsInferParseCustomSSD
custom-lib-path=nvdsinfer_custom_impl_ssd/libnvdsinfer_custom_impl_ssd.so
#scaling-filter=0
#scaling-compute-hw=0

[class-attrs-all]
pre-cluster-threshold=0.95
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

## Per class configuration
#[class-attrs-2]
#threshold=0.6
#roi-top-offset=20
#roi-bottom-offset=10
#detected-min-w=40
#detected-min-h=40
#detected-max-w=400
#detected-max-h=800

and this is the nvdsparsebbox_ssd.cpp file with a few changes

  • NUM_CLASSES_SSD = 9 instead of 91
  • layer names are compared to ‘scores’ and ‘boxes’
  • added “|| i<10” on the for cycle
  • added some printf functions

the rest is the same as the original example file.

/*
    * Copyright (c) 2018-2019, NVIDIA CORPORATION. All rights reserved.
    *
    * Permission is hereby granted, free of charge, to any person obtaining a
    * copy of this software and associated documentation files (the "Software"),
    * to deal in the Software without restriction, including without limitation
    * the rights to use, copy, modify, merge, publish, distribute, sublicense,
    * and/or sell copies of the Software, and to permit persons to whom the
    * Software is furnished to do so, subject to the following conditions:
    *
    * The above copyright notice and this permission notice shall be included in
    * all copies or substantial portions of the Software.
    *
    * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
    * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
    * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
    * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
    * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
    * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
    * DEALINGS IN THE SOFTWARE.
    */
    #include <cstring>
    #include <iostream>
    #include "nvdsinfer_custom_impl.h"
    #define MIN(a,b) ((a) < (b) ? (a) : (b))
    #define MAX(a,b) ((a) > (b) ? (a) : (b))
    #define CLIP(a,min,max) (MAX(MIN(a, max), min))
    /* This is a sample bounding box parsing function for the sample SSD UFF
    * detector model provided with the TensorRT samples. */
    extern "C"
    bool NvDsInferParseCustomSSD (std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
            NvDsInferNetworkInfo  const &networkInfo,
            NvDsInferParseDetectionParams const &detectionParams,
            std::vector<NvDsInferObjectDetectionInfo> &objectList);
    /* C-linkage to prevent name-mangling */
    extern "C"
    bool NvDsInferParseCustomSSD (std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
            NvDsInferNetworkInfo  const &networkInfo,
            NvDsInferParseDetectionParams const &detectionParams,
            std::vector<NvDsInferObjectDetectionInfo> &objectList)
    {
      static int nmsLayerIndex = -1;
      static int nms1LayerIndex = -1;
      static bool classMismatchWarn = false;
      int numClassesToParse;
      static const int NUM_CLASSES_SSD = 9;
      if (nmsLayerIndex == -1) {
        for (unsigned int i = 0; i < outputLayersInfo.size(); i++) {
          if (strcmp(outputLayersInfo[i].layerName, "scores") == 0) {
            nmsLayerIndex = i;
            break;
          }
        }
        if (nmsLayerIndex == -1) {
        std::cerr << "Could not find scores layer buffer while parsing" << std::endl;
        return false;
        }
      }
      if (nms1LayerIndex == -1) {
        for (unsigned int i = 0; i < outputLayersInfo.size(); i++) {
          if (strcmp(outputLayersInfo[i].layerName, "boxes") == 0) {
            nms1LayerIndex = i;
            break;
          }
        }
        if (nms1LayerIndex == -1) {
        std::cerr << "Could not find boxes layer buffer while parsing" << std::endl;
        return false;
        }
      }
      if (!classMismatchWarn) {
        if (NUM_CLASSES_SSD !=
            detectionParams.numClassesConfigured) {
          std::cerr << "WARNING: Num classes mismatch. Configured:" <<
            detectionParams.numClassesConfigured << ", detected by network: " <<
            NUM_CLASSES_SSD << std::endl;
        }
        classMismatchWarn = true;
      }
      
      numClassesToParse = MIN (NUM_CLASSES_SSD,
          detectionParams.numClassesConfigured);
          
      int keepCount = *((int *) outputLayersInfo[nms1LayerIndex].buffer);
      float *detectionOut = (float *) outputLayersInfo[nmsLayerIndex].buffer;
      
      for (int i = 0; i < keepCount|| i<10; ++i)
      {
        //printf("paso1: ");
        float* det = detectionOut + i * 7;
        //printf("%f ",*det);
        int classId = det[1];
        if (classId >= numClassesToParse)
        {
          continue;
        }
        float threshold = detectionParams.perClassPreclusterThreshold[classId];
        if (det[2] < threshold)
        {
          continue;
        }
        printf("threshold: %f\n",threshold);
        unsigned int rectx1, recty1, rectx2, recty2;
        NvDsInferObjectDetectionInfo object;
        
        rectx1 = det[3] * networkInfo.width;
        recty1 = det[4] * networkInfo.height;
        rectx2 = det[5] * networkInfo.width;
        recty2 = det[6] * networkInfo.height;
        
        object.classId = classId;
        object.detectionConfidence = det[2];
        
        /* Clip object box co-ordinates to network resolution */
        object.left = CLIP(rectx1, 0, networkInfo.width - 1);
        object.top = CLIP(recty1, 0, networkInfo.height - 1);
        object.width = CLIP(rectx2, 0, networkInfo.width - 1) - object.left + 1;
        object.height = CLIP(recty2, 0, networkInfo.height - 1) - object.top + 1;
        printf("CLASS ID=%d\n",classId);
        printf("CONFIDENCE=%f\n",det[2]);
        printf("BBOX=(%f ,%f); %fx%f\n",object.top,object.left,object.width,object.height);
        printf("-------------\n");
        objectList.push_back(object);
      }
      return true;
    }
    /* Check that the custom function has been defined correctly */
    CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomSSD);

@dusty_nv, sorry for tagging you here, but I was hoping you may know how to do this, since the nets were trained with your material.
Any help would be deeply appreciated.

Hi @ai12, I haven’t used the pytorch-ssd models with DeepStream before, sorry about that. I’m not sure what code changes would be needed to support it. These pytorch-ssd models do not have NMS clustering inside them, that is manually implemented in the post-processing of jetson-inference.

If possible, you may want to look into training your detection model with TLT (Transfer Learning Toolkit) which is interoperable with DeepStream.

Hi,

Do you mind to share the onnx file with us so we can check it directly?

Thanks.

Hi @AastaLLL. thanks you

I don’t mind at all. These are my files. labels.txt (63 Bytes) ssd-mobilenet.onnx (29.3 MB) .

This is the engine file generated.ssd-mobilenet.onnx_b1_gpu0_fp16.engine (15.1 MB)

And this is the last torch epoch of the training. mb1-ssd-Epoch-29-Loss-4.058936367864194.pth (29.3 MB)

By he way, the model is the same you would get by following dustynd’s steps on this guide. I executed everything as it is since the first great goal is to validate an example model on deepstream before training something on my own.

Thank you again.

Please also I want to know how to test some pytorch trained network on deepstream, perhaps if exist some tips or layer to add or adjust box or some some tutorial to take a look related to cnn, or rcnn

Best Regards