OK, so first my setup: I’m having a well running inference solution in python for three USB cams on a jetson nano. While I’m pretty happy with the inference frame rate (30 fps per camera with optimized 4 classes networks) I was trying to achieve the same for a Raspberry PI/Coral TPU solution. Since I could not make Google’s Transfer Learning example run on neither GPU, I switched to the very nice tutorials from @dusty_nv , who very detailed and cleanly provides tutorials for TL on a Jetson Nano.
So I followed his “reduce a SSD-MobileNetV1 to 9 fruits” sample and was able to create a stripped down model. It was also possible to run this in the context of Dusty’s Jetson-Inference project.
Before I was diving into the problems of how to convert ONNX to TensorFlow to Tensorflow Lite to compiled EdgeTPU code I was thinking about using my newly created model in the a.m. context of a well running DeepStream SDK app.
OK, so I took the SSD network and created a PGIE configuration file for it. This is it at the moment and it works:
[property]
workspace-size=800
gpu-id=0
model-color-format=0
net-scale-factor=0.003921569790691137
onnx-file=/home/ubuntu/dragonfly-safety/jetson-inference/models/primary-detector-nano/ssd-mobilenet.onnx
labelfile-path=/home/ubuntu/dragonfly-safety/jetson-inference/models/primary-detector-nano/labels_onnx.txt
model-engine-file=/home/ubuntu/dragonfly-safety/jetson-inference/models/primary-detector-nano/ssd-mobilenet.onnx_b1_gpu0_fp16.engine
batch-size=1
network-mode=2
num-detected-classes=9
maintain-aspect-ratio=1
gie-unique-id=1
is-classifier=0
output-blob-names=boxes;scores
parse-bbox-func-name=NvDsInferParseCustomSSD
custom-lib-path=/home/ubuntu/dragonfly-safety/jetson-inference/models/primary-detector-nano/libnvdsinfer_custom_impl_ssd.so
You see, I already messed with the same custom bbox lib as you did and came exactly to the same result: segmentation fault.
I found this sample on the net retinanet-examples/nvdsparsebbox_retinanet.cpp at main · NVIDIA/retinanet-examples · GitHub and found, that the bbox parser alone from the naming came very much closer to what is in the SSD net.
I was not able to find a “classes” element, which is strange, since the class_id at least should be determined. But bbox and scores have been found. For now I see my camera image, always claiming to see “apples” (since I hardcoded it to class_id 1) but no visible bounding boxes on the screen. Don’t know, if those again need a configuration somewhere, for sure there is.
So for now I took the /opt/nvidia/deepstream/deepstream5.1/sources/objectDetector_SSD/nvdsinfer_custom_impl_ssd
project for the make and using this content for nvdsparebbox_ssd.cpp
:
/*
* Copyright (c) 2018-2019, NVIDIA CORPORATION. All rights reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a
* copy of this software and associated documentation files (the "Software"),
* to deal in the Software without restriction, including without limitation
* the rights to use, copy, modify, merge, publish, distribute, sublicense,
* and/or sell copies of the Software, and to permit persons to whom the
* Software is furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
* DEALINGS IN THE SOFTWARE.
*/
#include <cstring>
#include <iostream>
#include "nvdsinfer_custom_impl.h"
#define MIN(a,b) ((a) < (b) ? (a) : (b))
#define MAX(a,b) ((a) > (b) ? (a) : (b))
#define CLIP(a,min,max) (MAX(MIN(a, max), min))
/* This is a sample bounding box parsing function for the sample SSD UFF
* detector model provided with the TensorRT samples. */
extern "C"
bool NvDsInferParseCustomSSD (std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
NvDsInferNetworkInfo const &networkInfo,
NvDsInferParseDetectionParams const &detectionParams,
std::vector<NvDsInferObjectDetectionInfo> &objectList);
/* C-linkage to prevent name-mangling */
extern "C"
bool NvDsInferParseCustomSSD (std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
NvDsInferNetworkInfo const &networkInfo,
NvDsInferParseDetectionParams const &detectionParams,
std::vector<NvDsInferObjectDetectionInfo> &objectList)
{
static int bboxLayerIndex = -1;
static int classesLayerIndex = -1;
static int scoresLayerIndex = -1;
static NvDsInferDimsCHW scoresLayerDims;
int numDetsToParse;
/* Find the bbox layer */
if (bboxLayerIndex == -1) {
for (unsigned int i = 0; i < outputLayersInfo.size(); i++) {
if (strcmp(outputLayersInfo[i].layerName, "boxes") == 0) {
bboxLayerIndex = i;
break;
}
}
if (bboxLayerIndex == -1) {
std::cerr << "Could not find bbox layer buffer while parsing" << std::endl;
return false;
}
}
/* Find the scores layer */
if (scoresLayerIndex == -1) {
for (unsigned int i = 0; i < outputLayersInfo.size(); i++) {
if (strcmp(outputLayersInfo[i].layerName, "scores") == 0) {
scoresLayerIndex = i;
getDimsCHWFromDims(scoresLayerDims, outputLayersInfo[i].dims);
break;
}
}
if (scoresLayerIndex == -1) {
std::cerr << "Could not find scores layer buffer while parsing" << std::endl;
return false;
}
}
/* Find the classes layer */
if (classesLayerIndex == -1) {
for (unsigned int i = 0; i < outputLayersInfo.size(); i++) {
if (strcmp(outputLayersInfo[i].layerName, "classes") == 0) {
classesLayerIndex = i;
break;
}
}
// if (classesLayerIndex == -1) {
// std::cerr << "Could not find classes layer buffer while parsing" << std::endl;
// return false;
// }
}
std::cout << "bboxLayerIndex " << bboxLayerIndex << " classesLayerIndex " << classesLayerIndex << " scoresLayerIndex " << scoresLayerIndex << std::endl;
/* Calculate the number of detections to parse */
numDetsToParse = scoresLayerDims.c;
float *bboxes = (float *) outputLayersInfo[bboxLayerIndex].buffer;
//float *classes = (float *) outputLayersInfo[classesLayerIndex].buffer;
float *scores = (float *) outputLayersInfo[scoresLayerIndex].buffer;
for (int indx = 0; indx < numDetsToParse; indx++)
{
float outputX1 = bboxes[indx * 4];
float outputY1 = bboxes[indx * 4 + 1];
float outputX2 = bboxes[indx * 4 + 2];
float outputY2 = bboxes[indx * 4 + 3];
float this_class = 0; //classes[indx];
float this_score = scores[indx];
float threshold = detectionParams.perClassThreshold[this_class];
if (this_score >= threshold)
{
NvDsInferParseObjectInfo object;
object.classId = 1;
object.detectionConfidence = this_score;
object.left = outputX1;
object.top = outputY1;
object.width = outputX2 - outputX1;
object.height = outputY2 - outputY1;
objectList.push_back(object);
}
}
return true;
}
/* Check that the custom function has been defined correctly */
CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomSSD);
I find it a bit clearer, also because the number of classes is derived, not hardcoded, but I didn’t compare that to the original version in order to find the problem with the segfault.
This works, my extra trace is visible (there is no classes element sigh), an apple is detected every now and then, no bbox is drawn as overlay yet.
Would be great if you could give me your opinion with this changed file.