• Hardware Platform: Jetson Xavier NX • DeepStream Version: 5.1 • Language: Python
I’ve trained a model using TAO (former TLT) to detect if people are wearing helmet, security glasses, face mask and security boots correclty.
In my TAO tests the model behaved correctly, made prediction and corresponding labels just fine.
Now, I’m trying to add it to my Deepstream project, to work as sgie, using “Person” class as input from pgie (TrafficCamNet).
However, the model is not predicting correctly. I based my project on python_apps/test_1 (to add tracker and sgie), test_3 (to read from file or rstp) and (to save img to file), so my final pipeline is:
streamux → pgie (People) → tracker → sgie (Custom TAO model) → nvidconv and filter (to save img to file) → tiler → nvosd → transform → sink
The custom model was trained on 2.200 images from 4 different classes (missing_helmet, missing_glasses, missing_mask, missing_boots) all of size 272x480.
I’ve tried running my app with MUXER_OUTPUT_WIDTH/HEIGHT and TILED_OUTPUT_WIDTH/HEIGHT varying from 420x272 (same as training size) to 649x480 without success.
These are examples of images saved by my app. I’m labeling people in blue BB and the sgie detection in red BB but, as you can see in many cases the red BB is very tiny and doesn’t match with the detection:
I think I’m doing something wrong with the size of the input video because, like I said, the tests using tlt-evaluate and images taken from the same video show good results.
Can anyone help me or guide me on the correct configuration of my model.
FYI, this is mi sgie config file:sgie_config_epp.txt (3.6 KB)
Yes, as a matter of fact, adding this line to my code: print('{}: left:{}, top:{}, width:{}, height:{}'.format(obj_name, left, top, width, height))
Will produce these lines in the output:
So for the person detection (pgie) the blue bounding box is accurate (295x99). But for sgie (“ojos_sin_proteccion”) detection is wrong and size is extremely small (7x4 pixels).
Although detections made by tlt-evaluate using frames taken from the same video, were accurate (in size and position). This is why I think the error is either in the trainig ize or the input resolution, but I don’t know the appropiate valuer or where to configure each.
Hi @monita.ramirezb ,
sgie receives the pgie object metadata from pgie and crops the object image based on the BBOX info in pgie object metadata, and then scale to the resolution of the input of sgie network. You can’t configure the input resolution for sgie.
I don’t understand which atribute from the pgie config-file I should modify. According to the documentation and the LPR example, the only bbox attributes are: input-object-min-width/height and input-object-max-width/height on sgie config-file. I’ve played a little with them with no change in the detections (tiny bb) or no detections at all! (even when a person is clearly in the frame): Pgie is detecting people just fine, passing them to sgie, but its the sgie that’s detecting incorrectly.
So, if it’s not the resolution of the input image, what can it be? Why is the detection of the mode working fine on images, but failing so much on stream video?
It’s my guess… because when I trained the model using TLT and ran tlt-infer on my test images (that were, actually, still frames from the test video) the labels and labeled images had almost perfect detections!
But when I deploy it to deepstream, and ran the full sample video, the tiny bounding boxes come up.
What else could be going wrong?
from the piece of code, it’s hard to say what the root cause is.
Could you apply below change to dump the raw output of the sgie and parse the output offline to check its BBOX side.
Sorry, not a good C++ user. My code is in Python.
But, maybe you can describe what the code does to see if I can replicate it on Python?
This code is to be added in nvinfer libs, nvinfer only supports C++, so no need to change to python.
HOW to apply the code:
# cd /opt/nvidia/deepstream/deepstream/sources/libs/nvdsinfer/
==> apply above change to nvdsinfer_context_impl.cpp
# export CUDA_VER=10.2
# make
# cp /opt/nvidia/deepstream/deepstream-6.0/lib/ ~/
# cp /opt/nvidia/deepstream/deepstream-6.0/lib/
then run your app, it will dump the raw output of the TRT infer for your mdoels into a binary file, then you can parse the file offline to check if the BBOX is really tiny.
I did the changes you mention on nvdsinfer_context_impl.cpp and my app ran without errors. But I can’t seem to find where the binary file you mention is located.
It’s not in the same folder as my app, nor in the nvdsinfer folder. (Even so, find -name gie-* returns no results)
Am I understanding wrong?
Ok, so getting the image this way is not working for me, but I this this other test:
I took and only changed the first sgie config file for mine. I tested it with a sample video .h64 and got similar results (This image is directly from the playback video, because in this test, I’m not even saving the image or generating the BB)
So the BB are being returned really tiny from the model. It’s not a matter of the process of saving the file because I’m not doing that in this case, any other idea what could be?
You’re right, they’re so small you can barely see them!
Here I made them white, thick and removed the labels. You can see them in the left border of the bounding box most of the times: