Custom object detection failed with Jetson AGX Orin: 2 classes not detected

domenico.depascale · February 13, 2023, 10:01am

Hello everyone,

I’m trying to make my own custom object detection on Jetson AGX Orin using Nvidia ‘Hello AI world’ tutorial.
I would launch the model on live video stream by using an USB Camera, but before I’m testing the model on a video.

I started labelling just a hundred of frames (from 12th to 114th). However I looked it doesn’t detect 2 labels out of 11: they are NA_15 and LE_05. It’s very strange because I can see them labelled in annotation files.

To explain better how I’m working, the steps are as follow:

Label video source on CVAT and export data in PASCAL VOC type;
Convert frames from PNG to JPG;
Train model using train_ssd.py;
Convert model in onnx;
Detectnet on video source.

I attach file I used:

-Label list:
labels.txt (72 Bytes)

-Annotation files XML:
Annotations.7z (15.8 KB)

Labels not detected:

AastaLLL · February 13, 2023, 11:18am

Hi,

Except from NA_15 and LE_05, does other 9 labels can be successfully detected?

Thanks.

domenico.depascale · February 13, 2023, 11:22am

Yes, it does!

dusty_nv · February 13, 2023, 5:18pm

Hi @domenico.depascale, I would run train_ssd.py with the --validation-mean-ap flag, and it will print out the per-class accuracies (Mean Average Precision) during the validation step. That should let you know how well the model is trained on each of the classes. You may need to add more data to your dataset for the classes that are underperforming.

domenico.depascale · February 15, 2023, 11:33am

Dear @dusty_nv and @AastaLLL , like Dusty suggested I have tried to add more data to my dataset and enable the --validation-mean-ap flag (these are the results eval_results.7z (277.1 KB)). I labelled the full video source on CVAT (about 1950 frames, Annotations.7z (15.8 KB)), but similar error still appears. Now the label not detected is NA_16 and not NA_15. This is very strange beacause I labelled the same number of frame as for NA_15 as for NA_16. In add, this two are the labels which appears frequently during the video. In fact I tracked more than 1 thousand frame with this two labels.

More strange it’s this situation: before this I made the same work on the same video source, which has been cropped, and the all process works very properly!

What might be the cause of this problem?

Thanks a lot.

dusty_nv · February 17, 2023, 3:21pm

Hi @domenico.depascale, those eval_results seem to be the raw detection bounding box outputs, so it’s hard for me to say if the model is balanced in terms of mean average precision across classes. At the end of each training epoch, it should have printed out a per-class summary when run with the --validation-mean-ap flag that will let you know the per-class accuracy over the test dataset.

domenico.depascale · February 21, 2023, 8:11am

Dear @dusty_nv first of all thank you for your support and sorry for the misunderstanding.

I launched again the training and these are the per-class accuracies summary printed out of the last 72 epochs out of 80: mAP.txt (341.6 KB).

Hope that now you have all you need.

Thanks a lot.

dusty_nv · February 21, 2023, 4:14pm

Hi @domenico.depascale, it makes sense that NA_16 class isn’t being detected during inference, because the mAP of that class is zero (along with TC_25):

2023-01-27 02:20:20 - Epoch: 79, Average Precision Per-class:
2023-01-27 02:20:20 -     NA_16: 0.0
2023-01-27 02:20:20 -     MO_13: 1.0000000000000002
2023-01-27 02:20:20 -     MO_11: 1.0000000000000002
2023-01-27 02:20:20 -     LE_04: 1.0000000000000002
2023-01-27 02:20:20 -     DR_05: 0.8405103668261562
2023-01-27 02:20:20 -     LE_03: 0.9090909090909093
2023-01-27 02:20:20 -     MO_07: 1.0000000000000002
2023-01-27 02:20:20 -     TC_01: 1.0000000000000002
2023-01-27 02:20:20 -     MO_09: 1.0000000000000002
2023-01-27 02:20:20 -     ST_02: 0.8181818181818183
2023-01-27 02:20:20 -     NA_15: 1.0000000000000002
2023-01-27 02:20:20 -     LE_05: 0.891829689298044
2023-01-27 02:20:20 -     LE_11: 1.0000000000000002
2023-01-27 02:20:20 -     TC_14: 1.0000000000000002
2023-01-27 02:20:20 -     LE_12: 0.974921630094044
2023-01-27 02:20:20 -     TC_15: 1.0000000000000002
2023-01-27 02:20:20 -     LE_13: 1.0000000000000002
2023-01-27 02:20:20 -     TC_16: 1.0000000000000002
2023-01-27 02:20:20 -     LE_14: 1.0000000000000002
2023-01-27 02:20:20 -     LE_15: 1.0000000000000002
2023-01-27 02:20:20 -     TC_17: 0.2628696604600219
2023-01-27 02:20:20 -     TC_18: 0.39040713887339656
2023-01-27 02:20:20 -     MO_08: 1.0000000000000002
2023-01-27 02:20:20 -     MO_10: 1.0000000000000002
2023-01-27 02:20:20 -     TC_19: 1.0000000000000002
2023-01-27 02:20:20 -     MO_02: 1.0000000000000002
2023-01-27 02:20:20 -     TC_20: 1.0000000000000002
2023-01-27 02:20:20 -     MO_17: 1.0000000000000002
2023-01-27 02:20:20 -     MO_05: 1.0000000000000002
2023-01-27 02:20:20 -     TC_21: 1.0000000000000002
2023-01-27 02:20:20 -     MO_22: 1.0000000000000002
2023-01-27 02:20:20 -     TC_22: 0.7272727272727274
2023-01-27 02:20:20 -     MO_20: 1.0000000000000002
2023-01-27 02:20:20 -     MO_15: 1.0000000000000002
2023-01-27 02:20:20 -     TC_23: 1.0000000000000002
2023-01-27 02:20:20 -     TC_24: 1.0000000000000002
2023-01-27 02:20:20 -     TC_25: 0.0
2023-01-27 02:20:20 -     TC_26: 1.0000000000000002
2023-01-27 02:20:20 -     TC_27: 1.0000000000000002
2023-01-27 02:20:20 -     TC_28: 1.0000000000000002
2023-01-27 02:20:20 -     TC_29: 1.0000000000000002
2023-01-27 02:20:20 -     TC_30: 1.0000000000000002
2023-01-27 02:20:20 -     TC_31: 1.0000000000000002
2023-01-27 02:20:20 -     TC_32: 0.893595041322314
2023-01-27 02:20:20 -     TC_33: 1.0000000000000002
2023-01-27 02:20:20 -     TC_34: 0.7272727272727274
2023-01-27 02:20:20 -     TC_35: 1.0000000000000002
2023-01-27 02:20:20 -     LX_01: 1.0000000000000002
2023-01-27 02:20:20 -     TC_02: 1.0000000000000002
2023-01-27 02:20:20 -     MO_01: 1.0000000000000002
2023-01-27 02:20:20 -     MO_16: 1.0000000000000002
2023-01-27 02:20:20 -     LE_01: 0.7342657342657343
2023-01-27 02:20:20 -     TC_37: 1.0000000000000002
2023-01-27 02:20:20 -     TC_03: 1.0000000000000002
2023-01-27 02:20:20 -     TC_04: 1.0000000000000002
2023-01-27 02:20:20 -     MO_04: 1.0000000000000002
2023-01-27 02:20:20 -     TC_05: 0.3994197292069632
2023-01-27 02:20:20 -     MO_06: 1.0000000000000002
2023-01-27 02:20:20 -     MO_18: 1.0000000000000002
2023-01-27 02:20:20 -     TC_06: 1.0000000000000002
2023-01-27 02:20:20 -     MO_IS: 1.0000000000000002
2023-01-27 02:20:20 -     TC_07: 1.0000000000000002
2023-01-27 02:20:20 -     MO_12: 1.0000000000000002
2023-01-27 02:20:20 -     MO_21: 1.0000000000000002
2023-01-27 02:20:20 -     TC_08: 1.0000000000000002
2023-01-27 02:20:20 -     MO_19: 1.0000000000000002
2023-01-27 02:20:20 -     TC_09: 1.0000000000000002
2023-01-27 02:20:20 -     MO_14: 1.0000000000000002
2023-01-27 02:20:20 -     TC_10: 1.0000000000000002
2023-01-27 02:20:20 -     TC_11: 1.0000000000000002
2023-01-27 02:20:20 -     LE_06: 1.0000000000000002
2023-01-27 02:20:20 -     LE_07: 0.9981818181818184
2023-01-27 02:20:20 -     LE_08: 1.0000000000000002
2023-01-27 02:20:20 -     TC_12: 0.6907674741048854
2023-01-27 02:20:20 -     LE_09: 1.0000000000000002
2023-01-27 02:20:20 -     TC_13: 1.0000000000000002
2023-01-27 02:20:20 -     LE_10: 1.0000000000000002
2023-01-27 02:20:20 - Epoch: 79, Mean Average Precision (mAP):  0.9254361878500204

Are you sure there are valid training examples for these classes in your dataset?

domenico.depascale · February 21, 2023, 4:49pm

Hi @dusty_nv I understood that the zero value of NA_16 class mAP is the problem (like TC_25), but I don’t understand why. The dataset is the same used for the other classes. In particular the entire dataset cames from the same video source. So I can’t get a reason to this problem. I repeat that the same process works properly on the same video makes cropped.

Do you need more informations or other data? Hope you have an advice for this which can help me.

Thanks in advance.

PS: I attach the annotation files used for the training. How you can see the NA_16 annotations are present from frame 12 to 1936 one.

Annotations.7z (21.6 KB)

dusty_nv · February 21, 2023, 4:57pm

Is there anything printed out when you first start the training like ‘unknown class NA_16’ or ‘missing labels for NA_16’ which may indicate they are mislabeled? The fact that it’s literally 0.0000% accuracy would seem to point to some problem with the dataset linked to that class.

Is the content material of the NA_16 class inherently more challenging to detect in some way? (i.e. the NA_16 objects are less discernable/identifiable, they are odd shapes/aspect ratios, they have larger variations, ect)

domenico.depascale · February 22, 2023, 8:24am

Hi @dusty_nv there isn’t anything printed out which may indicate the label is unknown. There is just this warning message:

/usr/local/lib/python3.8/dist-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction=‘sum’ instead.
warnings.warn(warning.format(ret))

About the second question: The entire dataset cames from the same video source. During labelling phase on CVAT, the tool divided the video into 1952 frames. I used only these ones. In add, NA_16 class indicates an arrow up-down and NA_15 class indicates an arrow down-up: there are the same in dimension, shapes, aspect ratios, but just NA_16 is recognized.

I’m available if you need something else. If you want a direct contact we can arrange a meeting.

Thank you.

dusty_nv · February 22, 2023, 2:51pm

Hmm okay - I wonder if perhaps the NA_16 arrows and the NA_15 arrows look so similar that the object detector has trouble telling them apart. Sometimes when in-depth classification is used, it can be done in a two stage process where first detection is run, and then a secondary classifier is run on the detections (i.e. the detector finds all ‘arrows’, and then the classifier picks the type of ‘arrow’).

In the earlier training epochs, NA_16 is actually being detected with an mAP of up to 90%. But then it drops to 0, presumably because the model fits to the other arrow type instead. You could try running the training with the --balance-data flag to see if that helps. However, it may be that this level of discernibility between classes is beyond the capabilities of this SSD-Mobilenet network (which is optimized for speed and performance to be deployed on embedded platforms). I’m not an expert on training, so you may need to experiment further or look to other types of detection models out there.

domenico.depascale · February 22, 2023, 4:00pm

Ok @dusty_nv, first I will try to run with --balance-data flag. How should I have to set it? Now it sets like action=‘store_true’.

If this will not works, I’ll try to train another model. Can you suggest me which can work properly in my case? Or also another network?

Thank you.

dusty_nv · February 22, 2023, 5:35pm

You should just be able to run it with --balance-data flag, you don’t need to set it to --balance-data=true or anything (since the action is already store_true)

I’m not exactly sure, sorry. Another idea I had is to select your ssd-mobilenet model from an epoch which trained the classes evenly. You can look at the Mean Average Precision report for each epoch and find one that is most balanced for all your classes. Then select this model checkpoint to export to ONNX.

domenico.depascale · February 23, 2023, 7:34am

Ok @dusty_nv thanks for your suggestions. I will let you know.

system · March 22, 2023, 2:43am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.