We’ve generated synthetic data using Isaac Sim and trained some DetectNetv2 object detection models, following the official tutorial here: https://docs.nvidia.com/isaac/isaac/packages/detect_net/doc/detect_net.html. Unfortunately, the object detection models are not performing as well as we had hoped on real-world data. The model performs fine on synthetic images.
We have spent some time trying to improve our synthetic data by setting up the Unity scene and randomization parameters to better match the real-world scene but we still have not had much success. Since the model is being purely trained on synthetic data right now, we figure training with a mix of real world & synthetic data could improve the accuracy of the model. However, we believe we don’t have enough data to mix the datasets. Our synthetic dataset size is 20000 per object and our real-world dataset is small, 200 frames or less per object we want to detect. The size of the synthetic dataset will likely dilute the importance of the real world data. Due to overfitting concerns, we don’t want to train on purely real-world data either.
For reference, here are some example values from the models we’ve trained:
- Model Type: DetectNetv2 - ResNet18
- Batch Size: 16
- Epochs: 120
- Dataset size: 20000
- Final Validation Cost: 0.000026
- mAP on Synthetic Data: 99.6%
- mAP on Real Data: 23.4%
- Training Tool Used: Nvidia Transfer Learning Toolkit
Do you have any advice on any steps we can take to improve the accuracy of our object detection models? Any changes we can make on the synthetic data generation side? On the training side, we thought about including the small real-world dataset in the training data but increasing its impact on the training process, perhaps by modifying the loss function to weigh real-world data more than synthetic data. Is that something we could feasibly do using Nvidia’s tools and models? Any other changes we should try on the training side?