I have pruned and retrained a Detectnet_v2 model using my own data for detecting the level of liquid in a tank and ended up with this for the precision for my classes:
Validation cost: 0.000003
Mean average_precision (in %): 99.2907
class name average precision (in %)
------------ --------------------------
high 98.9247
low 99.6139
medium 99.3334
I then went to run tlt-infer on the testing data set and most of the generated images had a bounding box for two or more of the classes when there is only one tank in the image. I was expecting to generate only a single bounding box. Here is an example of a label file generated for an image where the tank level is clearly high:
When the tlt-infer generates the label files, what is the value at the end of the line for each class referring to? It seems that it is a confidence value of sort, but I am unsure as it ranges quite a bit when I checked other label files and I was expecting a value between 0-1.0. How can I configure the tlt-infer so that it only generates the bounding box for a single class with the highest confidence level? Here is my inference spec sheet:
Thank you for pointing to where that was in the TLT documentation. It makes a lot more sense where that value was coming from now.
I also have a question about the parameters for the cost function for the training spec sheet. In the TLT documentation, its recommended to not change the values for the classes in the examples, but how should we modify the parameters when using our own data set using different classes specifically the class weight and the initial/target weights? I tried to find information on this, but the best that I could come up with was to set the weight to represent the class’s percentage in the data set. What is your recommendation for this?
Thank you that link does clarify how I should define the class weights. I am still a little confused on how to choose the initial and target weighs for the objectives within each class, as I’m unsure what those parameters represent. From other people’s spec sheets, it seems like these are commonly used settings:
Is it best to just stick with those settings? Also I tried to export the model as in both fp16 and int8 data types and got this error:
AttributeError: Specified FP16 but not supported on platform.
and had to settle for fp32. I plan to export this to a Jetson Nano and worry about compatibility/performance issues using a fp32 model. Is it possible to run the tlt-export command on the Jetson Nano with the .tlt model to create the .etlt model there?
Note that all the etlt model is fp32 mode even you set any other mode when you run tlt-export. You can run tlt-export in the host PC and generate the etlt model.
For Nano, it supports fp32 and fp16. You can deploy the etlt model in it.
Or, you can use tlt-convert to generate trt engine for deployment.
You are right, my GPU doesn’t support either those formats so I’ve purchased a more update to date GPU so that I can experiment with the other two formats. In the mean time, I’ve gone forward and imported the model to the Jetson Nano, converted it into the .engine file, but I’m having trouble getting the same detection as I did in the Jupyter Notebook in Deepstream. I’ll move over now to the Deepstream forums to bring up my issues there.
I’ve gone back and trained another model (a ResNet18 model using the Detectnet_v2 object detection architecture), modifying the spec sheets using the information you shared Morganh, but I’m still having problems with the inference. My new model has a similar mean average precision as my first model, but it still performs badly in inference using the mean_cov confidence model. Here is some information on my data set that may help solve my problems. The data set contains images of a tank containing various levels of a liquid captured throughout the day and night. All of the images used for training and testing have the same angle on the tank. The dataset is split up into the following classes:
.4 confidence is the best the model could achieve and even then there were images that were not annotated. Is this a case of a too small data set or is there something I can do to improve the parameters? I’m at a loss as to why the model can’t achieve better performance when the mean average precision is so high. Looking forward to hearing your perspective on this.
More, from your training spec, I can see the minumum_bounding_box_height is set to 75. How about your average weight or height of your training images’ objects? If a bbox’s height is lower than 75, the bbox will be ignored. So, please set a smaller minumum_bounding_box_height, and run tlt-evaluate directly to check how many mAP your model can get.