Difficulties encountered developing real-world based database - what to change, what to keep?

Hi there

I have very basic questions.

I am developing a binary classifier.

It is for use in a real-world industrial application to identify damage inside a large, inaccessible structure. The classes are ‘healthy’ and ‘damage’. I have data for both classes in the form of wideangle still images and cropped details from them.

Obviously damage only occurs within a part of an otherwise healthy frame. The data has been gathered using a system that provides variation-free images (no angle, focus variation, limited exposure variation – proximity is the only variable apart from the individual subject matter itself). Both classes are comprised, in visual terms, of various material textures.

So far I have built 2 versions of a database and tested them using the jetson-inference tutorials in order to make basic decisions like choice of pretrained model and balance of full-frame vs. detail images across the two classes. I hope to take the material to the next stage using the TAO toolkit.

I have tried GoogleNet, ResNet18, ResNet34, ResNet50.

Despite more than 2 months work I have not achieved any significant progress. I built the first database to 1,000 images and its confusion matrix stubbornly remained at (true positive x true negative) = (false positive x false negative), despite varying accuracy for ‘best model’ at the validation stage. It resulted in an 80% ‘damage’ inference. This never varied throughout the build.

The size of the first database became an unwelcome barrier to further development in the light of the lack of progress. I stripped down the material and only used one type of damage in the ‘damage’ class in a second database.

The results using the new database are essentially the same as before with the only difference being that 80% is now a ‘healthy’ inference.

I have kept records of the results of each 100 epoch training run.

My questions:
• should I be looking at a different backbone architecture for my user case?
• should I be using a more complicated detection/classification process?
• there are currently more wide frame images in the ‘healthy’ class and more detail images in the ‘damage’ class in the database, with a mix in both and the two classes are otherwise balanced. Is this good practice? Is there a better way?
• or, does anyone have any advice please?

I am using the 4 Gb Nano devkit and additional swap

Thank you and apologies for the long question.