Object detection training duration

andregaio23 · March 22, 2021, 10:25am

Hello,

I am currently using NVIDIA TLT to train a custom YOLOV4 object detection model (cspdarknet53 backbone) for 80 epochs using a GeForce GTX 1060 6GB - the estimated training duration is 6 days and a half.

When using darknet (GitHub - AlexeyAB/darknet: YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )) to train the same model, on the same dataset, for the same number of epochs, I am able to train it in a day and a half.

Why does training take so much longer using TLT? Are there any settings I can tweak to reduce training time?

Thanks.

Morganh · March 22, 2021, 3:58pm

Could you please share the training log and spec file?

andregaio23 · March 24, 2021, 7:42am

Sure (note that the log is incomplete as training had to be resumed from epoch 30 and it hasn’t finished all 80 epochs yet)

output_log.txt (4.1 KB) yolo_v4_train_cspdarknet53_label1.txt (2.0 KB) yolov4_training_log_cspdarknet53.csv (1.1 KB)

Morganh · March 24, 2021, 8:19am

Thanks for the info.
Do you ever try a larger batch-size? For example, bs=2 or 4.
If you did not try before, you can setup the experiment in other machine(since you are training current training in your GeForce GTX 1060) or later after your current training is done. You can also just use a smaller part of the training dataset.

More, in your current training, please help capture below log:

open 1st terminal, run following command
$ nvidia-smi dmon
open 2nd terminal, run following command
$ top
Then press 1

Please share the two logs with us.

andregaio23 · March 25, 2021, 12:41pm

I’ve tried larger batch_size values but training would error out. The only way I got it to start was by setting batch_size to 1.

nvidia-smi-dmon.txt (2.7 KB) top_1.txt (8.7 KB)

Morganh · March 26, 2021, 1:35am

Is it OOM error or “Killed” when you tried larger batch-size?

andregaio23 · March 29, 2021, 12:18pm

It was OOM error for batch=4. It seems to be running fine now for batch=2. Is there anything else I could try besides increasing the batch size and reducing the training dataset size?

Morganh · March 29, 2021, 1:01pm

Please try MPS (Multi_Process_Service)

Start MPS daemon process
nvidia-cuda-mps-control –d

Check MPS process
ps -ef | grep mps

Quit MPS daemon
echo quit | nvidia-cuda-mps-control

andregaio23 · April 2, 2021, 10:04am

I am now training through a tlt docker image (nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3) using cnvrg.io on a server with one A100 gpu. Do I need to run MPS on the PC/server itself, or can I run it from within the docker image?

Morganh · April 2, 2021, 1:53pm

You can login the tlt docker, then run training directly in the docker. Meanwhile, start MPS.

More, if you are training with A100 gpu, could you try larger batch-size?
To speed up training speed, normally,

increase bs (but please note that this should trade off mAP)
use multi-gpus

MPS is a way which will not increase speed a lot.

Topic		Replies	Views
TLT Training duration TAO Toolkit	4	754	April 16, 2021
TLT yolo_v4 slow training TAO Toolkit	10	1030	August 18, 2021
Very slow initialization of training and first epoch TAO Toolkit	10	3296	August 1, 2021
Problem with tlt yolo_v4 train TAO Toolkit	2	677	April 20, 2021
Training speed issue TAO Toolkit	1	535	November 15, 2022
Yolo_v4 getting stuck while training TAO Toolkit	2	1160	July 30, 2021
Training got killed before start TAO Toolkit	17	1709	January 25, 2022
Training Speed is too low While training TAO Toolkit	1	481	April 8, 2022
Yolo_v4 getting stuck while training OpenGL yolo , tao	0	967	October 12, 2021
Extremely slow train and evaluation of yolo_v4_tiny TAO Toolkit yolo , tao	11	1456	April 12, 2023

Object detection training duration

Related topics