Training Become very slow Yolov4

user86169 · December 30, 2021, 7:57am

Training Spec file: spec.txt (2.2 KB)
Training Snapshot :

TLT Version: docker_tag: v3.21.08-py3
Network Type : Yolov_4

Hi,

I am trying to train yolov4 using custom dataset in which i have only one class.
but training process running very slow and got killed in the 2nd epoch. i am unable to understand why this issue is happening even i am using command with gpu.

I have also attached the screenshot and spec file for your reference.

user86169 · December 31, 2021, 6:19am

Hi, please help me to resolve this issue.

Morganh · January 1, 2022, 4:09am

Seems that it is killed due to OOM. You can try a lower batch-size during training.

user86169 · January 2, 2022, 4:05pm

Hi,

I tried but what about the training speed because it is very low for each epoch it takes several minutes to complete.

I tried training command with gpu but when the training part is running i checked that the gpu % was still 0%. as shown below

Epoch 1/50
2/160 […] - ETA: 2:38:12 - loss: 13.5692/usr/local/lib/python3.6/dist-packages/keras/callbacks.py:122: UserWarning: Method on_batch_end() is slow compared to the batch update (7.225892). Check your callbacks.
% delta_t_median)
160/160 [==============================] - 1144s 7s/step - loss: 13.3983
8ba2ee0d2707:60:96 [0] NCCL INFO Bootstrap : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.4<0>
8ba2ee0d2707:60:96 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
8ba2ee0d2707:60:96 [0] NCCL INFO NET/IB : No device found.
8ba2ee0d2707:60:96 [0] NCCL INFO NET/Socket : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.4<0>
8ba2ee0d2707:60:96 [0] NCCL INFO Using network Socket
NCCL version 2.7.8+cuda11.1
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 00/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 01/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 02/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 03/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 04/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 05/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 06/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 07/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 08/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 09/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 10/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 11/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 12/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 13/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 14/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 15/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 16/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 17/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 18/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 19/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 20/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 21/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 22/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 23/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 24/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 25/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 26/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 27/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 28/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 29/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 30/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 31/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [1] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [2] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [3] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [4] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [5] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [6] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [7] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [8] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [9] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [10] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [11] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [12] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [13] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [14] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [15] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [16] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [17] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [18] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [19] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [20] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [21] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [22] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [23] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [24] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [25] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [26] -1/-1/-1->0->-1|-1->0->-1/-
8ba2ee0d2707:60:96 [0] NCCL INFO Setting affinity for GPU 0 to ffff
8ba2ee0d2707:60:96 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer
8ba2ee0d2707:60:96 [0] NCCL INFO comm 0x7fb50af951a0 rank 0 nranks 1 cudaDev 0 busId 17000 - Init COMPLETE
Epoch 2/50
142/160 [=========================>…] - ETA: 2:08 - loss: 13.0120

As I am using custom dataset in which I have only one class “phone”. But when I tried same training command with another dataset in which I have also one class for that new dataset gpu is running and training speed is also increasing.
Don’t know what’s the issue is can you please help me out from this training speed issue.

user86169 · January 3, 2022, 7:16am

Hi,

I am waiting for the response from your side.

Morganh · January 3, 2022, 2:24pm

From the nvidia-smi result, the memory(5118MiB / 16125MiB ) is consumed. You can run nvidia-smi with interval.

Any difference between the two datasets?

user86169 · January 6, 2022, 9:39am

No, only 1 difference is the total no of images. in the dataset i am facing training speed issue having 1600 images with there annotations in kitti format and in the latest dataset i have images around 300. Both dataset having the single class ‘phone’.

for my understanding that 1600 images dataset training is running on cpu instead of gpu. if I am correct then what exactly I need to do to resolve this training speed issue, also if I am wrong then what need to do?

user86169 · January 10, 2022, 4:39am

Hi, I am waiting for the response from your side.

Morganh · January 10, 2022, 6:47am

May I know how did you trigger the tao docker?

user86169 · January 10, 2022, 7:27am

Hi,

I used the below the below command to trigger tao docker.

tao yolo_v4 train --gpus 1 -e /workspace/tao-experiments/specs/spec.txt -r /workspace/tao-experiments/results -k (Here I am using my key)

Morganh · January 10, 2022, 7:30am

Can you check

“nvidia-smi” result before running above command
Run above command
“nvidia-smi” result after running above command

user86169 · January 10, 2022, 9:18am

Before running training command :-

using Nvidia-smi

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 1 N/A N/A 1335 G /usr/lib/xorg/Xorg 9MiB |
| 1 N/A N/A 1414 G /usr/bin/gnome-shell 6MiB |
±----------------------------------------------------------------------------+

After running training command :-
now using Nvidia-smi

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 11699 C /usr/bin/python3.6 115MiB |
| 0 N/A N/A 11753 C python3.6 1247MiB |
| 1 N/A N/A 1335 G /usr/lib/xorg/Xorg 9MiB |
| 1 N/A N/A 1414 G /usr/bin/gnome-shell 6MiB |
±----------------------------------------------------------------------------+

Morganh · January 10, 2022, 9:20am

So, from
1MiB / 16125MiB
to
1366MiB / 16125MiB

The GPU is up.

user86169 · January 10, 2022, 9:23am

but as u see gpu running % is still 0. so what does it mean. What to do?

user86169 · January 10, 2022, 9:42am

coz for this dataset training having 1600 images, gpu % is still 0 and training speed is also slow but for other datasets gpu % is increasing while training and training speed also good.

Morganh · January 10, 2022, 10:08am

Can you double check via monitoring
$ nvidia-smi dmon

user86169 · January 10, 2022, 10:29am

Before running training command :-

using Nvidia-smi dmon

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     8    28     -     0     0     0     0   405   300
1     3    29     -     0     0     0     0   405   300
0     8    28     -     0     0     0     0   405   300
1     3    29     -     0     0     0     0   405   300
0     8    28     -     0     0     0     0   405   300
1     3    29     -     0     0     0     0   405   300
0     8    28     -     0     0     0     0   405   300
1     3    29     -     0     0     0     0   405   300
0     8    28     -     0     0     0     0   405   300
1     3    29     -     0     0     0     0   405   300
0     8    28     -     0     0     0     0   405   300
1     3    29     -     0     0     0     0   405   300

After running training command :-
now using Nvidia-smi dmon

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    44    38     -     0     0     0     0  6500  1620
1     4    30     -     0     0     0     0   405   300
0    44    38     -     0     0     0     0  6500  1620
1     3    30     -     0     0     0     0   405   300
0    44    38     -     0     0     0     0  6500  1620
1     3    30     -     0     0     0     0   405   300
0    44    38     -     0     0     0     0  6500  1620
1     3    30     -     0     0     0     0   405   300
0    44    39     -     0     0     0     0  6500  1620
1     3    30     -     0     0     0     0   405   300
0    44    38     -     0     0     0     0  6500  1620
1     4    30     -     0     0     0     0   405   300
0    44    38     -     0     0     0     0  6500  1620
1     3    30     -     0     0     0     0   405   300
0     9    38     -     0     0     0     0   405   375
1     3    30     -     0     0     0     0   405   300
0     9    37     -     0     0     0     0   405   375
1     4    30     -     0     0     0     0   405   300
0     9    37     -     0     0     0     0   405   300
1     4    30     -     0     0     0     0   405   300
0    89    39     -    31    17     0     0  6500  1620
1     4    30     -     0     0     0     0   405   300
0    48    41     -    42    25     0     0  6500  1875
1     4    30     -     0     0     0     0   405   300
0    50    39     -     0     0     0     0  6500  1875
1     5    30     -     0     0     0     0   405   300
0    44    39     -     0     0     0     0  6500  1620
1     4    30     -     0     0     0     0   405   300
0    44    39     -     0     0     0     0  6500  1620
1     4    30     -     0     0     0     0   405   300
0    44    39     -     8     5     0     0  6500  1650
1     4    30     -     0     0     0     0   405   300

user86169 · January 10, 2022, 10:33am

not able to understand what’s the issue?

Morganh · January 10, 2022, 10:39am

Actually I am running a yolov4 training with COCO2017 dataset but cannot reproduce “gpu utils is always 0”.

Suggest you to monitor the “sm” for a long time.

More, please try to use KITTI dataset to train as well.

user86169 · January 10, 2022, 10:43am

Hi,

I am already using dataset in kitti format for training.
as u can see it’s been around 15 min and still the training is in the 2nd epoch.

Epoch 1/50
2/320 […] - ETA: 6:17:21 - loss: 13.3049 /usr/local/lib/python3.6/dist-packages/keras/callbacks.py:122: UserWarning: Method on_batch_end() is slow compared to the batch update (10.254203). Check your callbacks.
% delta_t_median)
153/320 [=============>…] - ETA: 17:39 - loss: 13.6621

Topic		Replies	Views
Training got killed before start TAO Toolkit	18	1442	February 8, 2022
[TLT] YoloV4 training fails. training process asigned to CPU instead of GPU? TAO Toolkit	8	442	August 9, 2022
High ram usage with tlt ResNet TAO Toolkit	42	997	July 6, 2022
Yolov4 multi-gpu training with Darknet Arch encounters a problem TAO Toolkit	17	749	July 2, 2023
Multi GPU's and invalid loss TAO Toolkit	18	1176	July 19, 2022
Yolo_v4_tiny randomly stops docker container during second or third validation phase with no errors TAO Toolkit yolo	20	880	August 29, 2022
TAO action recogniton net trainning extremely slow TAO Toolkit tao	20	644	August 7, 2023
Tao pre-trained yolo4tiny - AssertionError: Must have more boxes than clusters TAO Toolkit	54	2280	January 21, 2022
Extremely slow train and evaluation of yolo_v4_tiny TAO Toolkit yolo , tao	12	1239	April 12, 2023
Mask2Former_inst model training crashed after 1 epoch TAO Toolkit inception	23	36	January 30, 2025

Training Become very slow Yolov4

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

Related topics