• Hardware (RTX 3080)
• Network Type (Detectnet_v2)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file: detectnet_v2_train_resnet50_kitti.txt (4.2 KB)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
Hi I am training an object detection network on a two class database. The classes are ‘healthy’ and ‘damage’ and the images were originated on a bespoke imaging system where there is no rotational variation or colour variation and therefore these augmentation options have not been used. There are variations in image size and so ‘enable_auto_resize: True’ has been used.
When preparing the kitti directories, the ‘healthy’ class files were given 6 digit names starting with a ‘0’ and the ‘damage’ files start with a ‘1’ first digit. The files within each class do not have entirely consecutive numbers. I have read the ‘Data Annotation Format’ page and as far as I can tell the data follows those guidelines.
Also it is very easy to differentiate the classes by human eye.
The results of the evaluation are very bad:
An accuracy of ‘0’ for the ‘healthy’ class suggests that something is set up incorrectly, even though everything runs as expected up to this point. As a sanity check I ran all the cells from the start (with the exception of the ‘kitti object detection’ dataset and the resnet 50 backbone) to ensure that the system had no files from earlier experiments.
Can you offer any advice please?
Can you share the training log? Is the healthy getting AP 0 all the time?
More, what is the average resolution of the training images? For example, if it is 1024x768, you can set it in the config file. It is suggested to train a model which has the similar input as the training images.
I have run training three times. Each time healthy is 0. (damage was 2.02928, I moved damage ahead of healthy in the configuration file to see if it was an alphabetical issue, as a result damage changed to 0, I read another post that mentioned ‘enable_auto_resize: True’, I tried that and damage AP rose to 24.6958).
Straight out of the capture device the resolution is 3648 x 1417, but there are also smaller cropped images in the datasets (closely matching the quantities across both classes), down to around 500 x 500. In the configuration file under ‘augmentation_config’ I use 1248 x 384, which is close to an exact scale-down of the hi-res images, and an approximate average, with both numbers divisible by 16.
Both classes have the same number of images.
I am wondering whether part of the problem is that an image with damage in it will often show the damage against a wider background setting that is otherwise healthy.
I am hoping not to have to use semantic segmentation for what appears to be a fairly straightforward task.
This is the log that relates to the experiment. I think I misinterpreted the lines at:
2023-01-27 15:40:40,573 [INFO] root: saving trained model
2023-01-27 15:40:41,485 [INFO] root: Model saved
I then noted that 2 root errors had been found, but I am unable to fully interpret the alerts. However it seems to have been caused by the file “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”
On my machine I do not have a directory called “tensorflow_core” in the dist-packages directory.
I ran the evaluate cell in the hope that it would help me analyse the issue.
May I make 100% sure I understand you: do you want me to recursively remove the folder named “experiment_dir_unpruned” and then make an empty new directory named “experiment_dir_unpruned” for the training program to re-populate, and then rerun as far as “!tao detectnet_v2 train” and then check the training log?
As I mention in 3 above:
Straight out of the capture device the resolution is 3648 x 1417, but there are also smaller cropped images in the datasets (closely matching the quantities across both classes), down to around 500 x 500 or even less. In the configuration file under ‘augmentation_config’ I use 1248 x 384, which is close to an exact scale-down of the hi-res images, and an approximate average, with both numbers divisible by 16.
The size of a “healthy” bbox always matches the image size, so the average of height and width will vary from 1845.5 for the largest images, down to 350 or less for the smallest images.
A “damage” bbox will have an average in the range 800 down to around 200. Occasionally a “damage” image might fill a small frame.
I am struggling to understand how the average precision of the"healthy" class can be zero. In every healthy image the bbox should exactly match the image dimensions. To achieve an accuracy of zero the network would need to be making a prediction outside the image. That should not be possible. An average precision of 100% would seem to be much more likely.
Does the notebook allow the confidence threshold to be adjusted? Maybe it is a confidence issue?