Dear NVIDIA Developer Team,
We are currently developing a DeepStream app for Personal Protective Equipment (PPE) detection using a YOLOv8 object detection model. However, we have encountered an issue with the bounding boxes and the number of detected objects during the app execution. Below is a detailed description of our process, the issue we are facing, and the steps we have already taken to troubleshoot the problem.
Env:
**• Hardware Platform (Jetson)
TensorRT Version: 10.3.0
GPU Type: Jetson Orin NX
Nvidia Driver Version: jetpack version 6.2
CUDA Version: 12.6
Operating System + Version: Ubuntu -22.04
Python Version (if applicable): python -3.10.12 **
**• DeepStream Version: 7.1 **
- Model and Dataset
-
We have trained a YOLOv8 object detection model on our PPE dataset, which includes the following classes:
-
Safety goggles
-
Non-safety goggles
-
Toe guards
-
Non-safety shoes
-
Safety shoes
- Model Conversion
-
After training the model, we converted it into a .engine file using the trtexec command.
-
The model is now being used in our DeepStream application.
- Problem Description
-
When we run the app, we encounter two main issues:
-
Bounding Boxes: The bounding boxes are being drawn at the corners of the image rather than surrounding the detected objects.
-
Object Count: The number of detected objects is also incorrect.
- Debugging Steps Taken
To identify the root cause, we performed the following debugging steps:
- Model Check: We ran the .engine model outside of DeepStream to see if the issue persisted. We observed the same bounding box issue in this case.
Terminal output:
Class ID: 4, Label: Non Safety Goggles: 4.66
Warning: class_id 62 is out of bounds, skipping detection.
Warning: class_id 9 is out of bounds, skipping detection.
Warning: class_id 35 is out of bounds, skipping detection.
Class ID: 4, Label: Non Safety Goggles: 4.18
Warning: class_id 60 is out of bounds, skipping detection.
Warning: class_id 8 is out of bounds, skipping detection.
Warning: class_id 35 is out of bounds, skipping detection.
Class ID: 4, Label: Non Safety Goggles: 4.39
Warning: class_id 60 is out of bounds, skipping detection.
Warning: class_id 9 is out of bounds, skipping detection.
Warning: class_id 35 is out of bounds, skipping detection.
Class ID: 4, Label: Non Safety Goggles: 4.04
Warning: class_id 61 is out of bounds, skipping detection.
Warning: class_id 8 is out of bounds, skipping detection.
- Parser Check: We suspected that the issue might lie in the parser, so we used the pretrained YOLOv8 object detection model with the same parser for the person, we take person as class in the label.txt file and change the num_of_detected classes in the parser However, we encountered the same problem with the bounding boxes.
The Following image states our issue:
Our parser script is:
include
include
include
include
include
include
include <unordered_map>
include “nvdsinfer_custom_impl.h”
static const int NUM_CLASSES_YOLO = 1; // Only detecting “person” class
float clamp(const float val, const float minVal, const float maxVal)
{
assert(minVal <= maxVal);
return std::min(maxVal, std::max(minVal, val));
}
static NvDsInferParseObjectInfo convertBBoxYoloV8(const float& bx, const float& by, const float& bw,
const float& bh, const int& stride, const uint& netW,
const uint& netH)
{
NvDsInferParseObjectInfo b;
float xCenter = bx * stride;
float yCenter = by * stride;
float x0 = xCenter - bw / 2;
float y0 = yCenter - bh / 2;
float x1 = x0 + bw;
float y1 = y0 + bh;
x0 = clamp(x0, 0, netW);
y0 = clamp(y0, 0, netH);
x1 = clamp(x1, 0, netW);
y1 = clamp(y1, 0, netH);
b.left = x0;
b.width = clamp(x1 - x0, 0, netW);
b.top = y0;
b.height = clamp(y1 - y0, 0, netH);
return b;
}
static void addBBoxProposalYoloV8(const float bx, const float by, const float bw, const float bh,
const uint stride, const uint& netW, const uint& netH, const int maxIndex,
const float maxProb, std::vector& binfo)
{
NvDsInferParseObjectInfo bbi = convertBBoxYoloV8(bx, by, bw, bh, stride, netW, netH);
if (bbi.width < 1 || bbi.height < 1) return;
bbi.detectionConfidence = maxProb;
bbi.classId = maxIndex;
binfo.push_back(bbi);
}
static bool NvDsInferParseYoloV8(
std::vector const& outputLayersInfo,
NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams,
std::vector& objectList)
{
if (outputLayersInfo.empty()) {
std::cerr << “Could not find output layer in bbox parsing” << std::endl;;
return false;
}
const NvDsInferLayerInfo &layer = outputLayersInfo[0];
if (NUM_CLASSES_YOLO != detectionParams.numClassesConfigured)
{
std::cerr << "WARNING: Num classes mismatch. Configured:"
<< detectionParams.numClassesConfigured
<< ", detected by network: " << NUM_CLASSES_YOLO << std::endl;
}
std::vector<NvDsInferParseObjectInfo> objects;
float* data = (float*)layer.buffer;
const int dimensions = layer.inferDims.d[1];
int rows = layer.inferDims.numElements / layer.inferDims.d[1];
for (int i = 0; i < rows; ++i) {
//85 = x, y, w, h, score0......score79
float bx = data[0];
float by = data[1];
float bw = data[2];
float bh = data[3];
float * classes_scores = data + 4;
float maxScore = 0;
int index = 0;
// Loop through the only class (index 0 for person):
if (*classes_scores > maxScore) {
index = 0; // Only detecting person (class index 0)
maxScore = *classes_scores;
}
// Check confidence threshold for "person" class (index 0)
if (maxScore > detectionParams.perClassPreclusterThreshold[index]) {
int maxIndex = index;
data += dimensions;
addBBoxProposalYoloV8(bx, by, bw, bh, 1, networkInfo.width, networkInfo.height, maxIndex, maxScore, objects);
} else {
data += dimensions;
}
}
objectList = objects;
return true;
}
extern “C” bool NvDsInferParseCustomYoloV8(
std::vector const& outputLayersInfo,
NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams,
std::vector& objectList)
{
return NvDsInferParseYoloV8(
outputLayersInfo, networkInfo, detectionParams, objectList);
}
/* Check that the custom function has been defined correctly */
CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomYoloV8);
- Parser or Display Issue: Based on our observations, we suspect the issue is either in the parser or in the display handling or in the model.
- Next Steps & Request for Assistance
- We have written the parser code based on a reference link(deepstream_tools/yolo_deepstream/deepstream_yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp at main · NVIDIA-AI-IOT/deepstream_tools · GitHub), and below are the parser code and OSD probe function for your review.
.cpp script:
include
include
include
include
include
include
include <unordered_map>
include “nvdsinfer_custom_impl.h”
static const int NUM_CLASSES_YOLO = 5;
float clamp(const float val, const float minVal, const float maxVal)
{
assert(minVal <= maxVal);
return std::min(maxVal, std::max(minVal, val));
}
static NvDsInferParseObjectInfo convertBBoxYoloV8(const float& bx, const float& by, const float& bw,
const float& bh, const int& stride, const uint& netW,
const uint& netH)
{
NvDsInferParseObjectInfo b;
// Restore coordinates to network input resolution
float xCenter = bx * stride;
float yCenter = by * stride;
float x0 = xCenter - bw / 2;
float y0 = yCenter - bh / 2;
float x1 = x0 + bw;
float y1 = y0 + bh;
x0 = clamp(x0, 0, netW);
y0 = clamp(y0, 0, netH);
x1 = clamp(x1, 0, netW);
y1 = clamp(y1, 0, netH);
b.left = x0;
b.width = clamp(x1 - x0, 0, netW);
b.top = y0;
b.height = clamp(y1 - y0, 0, netH);
return b;
}
static void addBBoxProposalYoloV8(const float bx, const float by, const float bw, const float bh,
const uint stride, const uint& netW, const uint& netH, const int maxIndex,
const float maxProb, std::vector& binfo)
{
NvDsInferParseObjectInfo bbi = convertBBoxYoloV8(bx, by, bw, bh, stride, netW, netH);
if (bbi.width < 1 || bbi.height < 1) return;
bbi.detectionConfidence = maxProb;
bbi.classId = maxIndex;
binfo.push_back(bbi);
}
static bool NvDsInferParseYoloV8(
std::vector const& outputLayersInfo,
NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams,
std::vector& objectList)
{
if (outputLayersInfo.empty()) {
std::cerr << “Could not find output layer in bbox parsing” << std::endl;;
return false;
}
const NvDsInferLayerInfo &layer = outputLayersInfo[0];
if (NUM_CLASSES_YOLO != detectionParams.numClassesConfigured)
{
std::cerr << "WARNING: Num classes mismatch. Configured:"
<< detectionParams.numClassesConfigured
<< ", detected by network: " << NUM_CLASSES_YOLO << std::endl;
}
std::vector<NvDsInferParseObjectInfo> objects;
float* data = (float*)layer.buffer;
const int dimensions = layer.inferDims.d[1];
int rows = layer.inferDims.numElements / layer.inferDims.d[1];
for (int i = 0; i < rows; ++i) {
//85 = x, y, w, h, score0......score79
float bx = data[0];
float by = data[1];
float bw = data[2];
float bh = data[3];
float * classes_scores = data + 4;
float maxScore = 0;
int index = 0;
for (int j = 0; j < NUM_CLASSES_YOLO; j++){
if(*classes_scores > maxScore){
index = j;
maxScore = *classes_scores;
}
classes_scores++;
}
// Important: Check confidence threshold here
if (maxScore > detectionParams.perClassPreclusterThreshold[index]) {
int maxIndex = index;
data += dimensions;
// Use maxScore as confidence instead of always using 1.0
//float maxProb = 0
addBBoxProposalYoloV8(bx, by, bw, bh, 1, networkInfo.width, networkInfo.height, maxIndex, maxScore, objects);
} else {
data += dimensions;
}
}
objectList = objects;
return true;
}
extern “C” bool NvDsInferParseCustomYoloV8(
std::vector const& outputLayersInfo,
NvDsInferNetworkInfo const& networkInfo,
NvDsInferParseDetectionParams const& detectionParams,
std::vector& objectList)
{
return NvDsInferParseYoloV8(
outputLayersInfo, networkInfo, detectionParams, objectList);
}
/* Check that the custom function has been defined correctly */
CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomYoloV8);
-
We would appreciate your assistance in identifying the cause of the bounding box issue and any guidance on how to resolve it.
-
We are particularly looking for help in debugging the parser or display code, as we believe one of these components may be the source of the problem.
Please let us know if you require additional information or if we can provide any further details to assist in troubleshooting.
Thank you for your time and support!