I am trying to finetune a pre-trained actionrecognitionet model on my custom dataset through TAO. However, fine-tuning process is extremely slow( >30mins per epoch). It was also observed that the GPU utilization is not constant and idles to 0 frequently. I would greatly appreciate any insights, suggestions, or assistance that the community can provide regarding this matter. Thank you in advanced for your time and support.
• Hardware - GPU : 24GBs RTX 3090, CPU: intel i9, Ram - 64GBs
• Network Type - ActionRecognitionNet
I am launching the training via the following command
The demo finetuning was significantly faster, completed the run within minutes and there were no issues Please let me know if there are any other details that you want me to share. Thank you in advanced.
I have increased rgb_seq_length to 64, batch_size to 64, and workers to 32. However, there is still no improvement in the training speed and remains significantly slow. Do you think the data preprocessing/loaders within TAO is causing a bottleneck?
Apologies for the delay. I want to update you that I have successfully upgraded the driver from version 470 to version 525. Despite this upgrade, the issue we were facing earlier remains unresolved, and unfortunately, the training process is still excessively slow.
I am currently working in a server-like setup, which unfortunately doesn’t allow me to run training through a notebook. Instead, I directly invoke the training container. I have successfully executed the demo fine-tuning process once again and have attached it for your reference. The demo ran smoothly without any problems and at a faster pace . Thank you for your support in this process logs_demo_tao.txt (95.7 KB)
From logs_demo_tao.txt, it is loading trained weights from /shared_volume/results/pretrained/actionrecognitionnet_vtrainable_v1.0/resnet18_3d_rgb_hmdb5_32.tlt . It is running 3d model.
From trainlogs_debug_forums.txt , it is loading trained weights from /shared_volume/models/pretrained/resnet18_2d_rgb_hmdb5_32.tlt. It is running 2d model.
Could you please run 3d model as well against your own dataset? You can also refer to the training spec in the notebook. For example, set lr to 0.01.
Thank you for your support. I wanted to share my findings with you. After reducing the rgb_sequence_length from 32 to 3 in my dataset, I noticed a significant improvement in training speed. However, I have concerns about whether lowering the rgb_sequence_length might lead to a loss of important temporal information during inference, particularly for the 2D action recognition task on the assembly101 coarse action dataset.
The assembly101 dataset contains assembly-related actions such as screwing and mounting, where capturing fine actions across time is crucial for accurate recognition. While the reduced rgb_sequence_length seems to avoid bottlenecks in the TAO training pipeline, I worry it may compromise the accuracy of the model on this specific dataset.
I would appreciate your guidance, suggestions, or insigths on determining the ideal rgb_sequence_length that balances efficient training without sacrificing accuracy, so I can fine-tune the 2D action recognition model effectively for the assembly101 dataset
Usually, reducing the rgb_sequence_length from 32 to 3 will result in a lower accuracy. But we still need to check the training speed. Could you please do apple-to-apple comparison while using the same model?
You can download train_rgb_2d.yaml and run against the dataset mentioned in the notebook. For train_rgb_2d.yaml , please run
Thanks for the result. So, when you train HMDB51 dataset with 2D or 3D model, the training speed is normal.
But when you run training against “assembly101 dataset”, with the same training parameters(such as rgb_sequence_length: 32), the training becomes slow, correct?
Thank you for your continued support. I would like to clarify that during the previous training on the HMDB51 dataset, the ‘rgb_sequence_length’ was maintained at its original value of 3 in the configuration file. When attempting to increase the ‘rgb_sequence_length’ to 32, I observed a considerable reduction in training speed. Now, as I am to fine-tune the same model on the assembly101 dataset, I am seeking the optimal ‘rgb_sequence_length’ that strikes a balance between achieving high accuracy and not significantly slowing down the training speed.. Thank you once again in advance.