Training Become very slow Yolov4

Training Spec file: spec.txt (2.2 KB)
Training Snapshot :

TLT Version: docker_tag: v3.21.08-py3
Network Type : Yolov_4

Hi,

I am trying to train yolov4 using custom dataset in which i have only one class.
but training process running very slow and got killed in the 2nd epoch. i am unable to understand why this issue is happening even i am using command with gpu.

I have also attached the screenshot and spec file for your reference.

Hi, please help me to resolve this issue.

Seems that it is killed due to OOM. You can try a lower batch-size during training.

Hi,

I tried but what about the training speed because it is very low for each epoch it takes several minutes to complete.

I tried training command with gpu but when the training part is running i checked that the gpu % was still 0%. as shown below

Epoch 1/50
2/160 […] - ETA: 2:38:12 - loss: 13.5692/usr/local/lib/python3.6/dist-packages/keras/callbacks.py:122: UserWarning: Method on_batch_end() is slow compared to the batch update (7.225892). Check your callbacks.
% delta_t_median)
160/160 [==============================] - 1144s 7s/step - loss: 13.3983
8ba2ee0d2707:60:96 [0] NCCL INFO Bootstrap : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.4<0>
8ba2ee0d2707:60:96 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
8ba2ee0d2707:60:96 [0] NCCL INFO NET/IB : No device found.
8ba2ee0d2707:60:96 [0] NCCL INFO NET/Socket : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.4<0>
8ba2ee0d2707:60:96 [0] NCCL INFO Using network Socket
NCCL version 2.7.8+cuda11.1
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 00/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 01/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 02/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 03/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 04/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 05/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 06/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 07/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 08/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 09/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 10/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 11/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 12/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 13/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 14/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 15/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 16/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 17/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 18/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 19/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 20/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 21/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 22/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 23/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 24/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 25/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 26/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 27/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 28/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 29/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 30/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 31/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [1] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [2] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [3] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [4] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [5] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [6] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [7] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [8] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [9] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [10] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [11] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [12] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [13] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [14] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [15] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [16] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [17] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [18] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [19] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [20] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [21] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [22] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [23] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [24] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [25] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [26] -1/-1/-1->0->-1|-1->0->-1/-
8ba2ee0d2707:60:96 [0] NCCL INFO Setting affinity for GPU 0 to ffff
8ba2ee0d2707:60:96 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer
8ba2ee0d2707:60:96 [0] NCCL INFO comm 0x7fb50af951a0 rank 0 nranks 1 cudaDev 0 busId 17000 - Init COMPLETE
Epoch 2/50
142/160 [=========================>…] - ETA: 2:08 - loss: 13.0120

±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 5000 Off | 00000000:17:00.0 Off | Off |
| 33% 38C P2 44W / 230W | 5118MiB / 16125MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 Quadro RTX 5000 Off | 00000000:B3:00.0 Off | Off |
| 33% 30C P8 3W / 230W | 20MiB / 16122MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

As I am using custom dataset in which I have only one class “phone”. But when I tried same training command with another dataset in which I have also one class for that new dataset gpu is running and training speed is also increasing.
Don’t know what’s the issue is can you please help me out from this training speed issue.

Hi,

I am waiting for the response from your side.

From the nvidia-smi result, the memory(5118MiB / 16125MiB ) is consumed. You can run nvidia-smi with interval.

Any difference between the two datasets?

No, only 1 difference is the total no of images. in the dataset i am facing training speed issue having 1600 images with there annotations in kitti format and in the latest dataset i have images around 300. Both dataset having the single class ‘phone’.

for my understanding that 1600 images dataset training is running on cpu instead of gpu. if I am correct then what exactly I need to do to resolve this training speed issue, also if I am wrong then what need to do?

Hi, I am waiting for the response from your side.

May I know how did you trigger the tao docker?

Hi,

I used the below the below command to trigger tao docker.

tao yolo_v4 train --gpus 1 -e /workspace/tao-experiments/specs/spec.txt -r /workspace/tao-experiments/results -k (Here I am using my key)

Can you check

  • “nvidia-smi” result before running above command
  • Run above command
  • “nvidia-smi” result after running above command

Before running training command :-

  1. using Nvidia-smi

Mon Jan 10 14:37:18 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 5000 Off | 00000000:17:00.0 Off | Off |
| 33% 37C P8 8W / 230W | 1MiB / 16125MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 Quadro RTX 5000 Off | 00000000:B3:00.0 Off | Off |
| 33% 35C P8 4W / 230W | 20MiB / 16122MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 1 N/A N/A 1335 G /usr/lib/xorg/Xorg 9MiB |
| 1 N/A N/A 1414 G /usr/bin/gnome-shell 6MiB |
±----------------------------------------------------------------------------+

  1. After running training command :-
    now using Nvidia-smi

Mon Jan 10 14:42:26 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 5000 Off | 00000000:17:00.0 Off | Off |
| 33% 38C P8 10W / 230W | 1366MiB / 16125MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 Quadro RTX 5000 Off | 00000000:B3:00.0 Off | Off |
| 33% 32C P8 3W / 230W | 20MiB / 16122MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 11699 C /usr/bin/python3.6 115MiB |
| 0 N/A N/A 11753 C python3.6 1247MiB |
| 1 N/A N/A 1335 G /usr/lib/xorg/Xorg 9MiB |
| 1 N/A N/A 1414 G /usr/bin/gnome-shell 6MiB |
±----------------------------------------------------------------------------+

So, from
1MiB / 16125MiB
to
1366MiB / 16125MiB

The GPU is up.

but as u see gpu running % is still 0. so what does it mean. What to do?

coz for this dataset training having 1600 images, gpu % is still 0 and training speed is also slow but for other datasets gpu % is increasing while training and training speed also good.

Can you double check via monitoring
$ nvidia-smi dmon

Before running training command :-

  1. using Nvidia-smi dmon

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0     8    28     -     0     0     0     0   405   300
1     3    29     -     0     0     0     0   405   300
0     8    28     -     0     0     0     0   405   300
1     3    29     -     0     0     0     0   405   300
0     8    28     -     0     0     0     0   405   300
1     3    29     -     0     0     0     0   405   300
0     8    28     -     0     0     0     0   405   300
1     3    29     -     0     0     0     0   405   300
0     8    28     -     0     0     0     0   405   300
1     3    29     -     0     0     0     0   405   300
0     8    28     -     0     0     0     0   405   300
1     3    29     -     0     0     0     0   405   300
  1. After running training command :-
    now using Nvidia-smi dmon

gpu pwr gtemp mtemp sm mem enc dec mclk pclk

Idx W C C % % % % MHz MHz

0    44    38     -     0     0     0     0  6500  1620
1     4    30     -     0     0     0     0   405   300
0    44    38     -     0     0     0     0  6500  1620
1     3    30     -     0     0     0     0   405   300
0    44    38     -     0     0     0     0  6500  1620
1     3    30     -     0     0     0     0   405   300
0    44    38     -     0     0     0     0  6500  1620
1     3    30     -     0     0     0     0   405   300
0    44    39     -     0     0     0     0  6500  1620
1     3    30     -     0     0     0     0   405   300
0    44    38     -     0     0     0     0  6500  1620
1     4    30     -     0     0     0     0   405   300
0    44    38     -     0     0     0     0  6500  1620
1     3    30     -     0     0     0     0   405   300
0     9    38     -     0     0     0     0   405   375
1     3    30     -     0     0     0     0   405   300
0     9    37     -     0     0     0     0   405   375
1     4    30     -     0     0     0     0   405   300
0     9    37     -     0     0     0     0   405   300
1     4    30     -     0     0     0     0   405   300
0    89    39     -    31    17     0     0  6500  1620
1     4    30     -     0     0     0     0   405   300
0    48    41     -    42    25     0     0  6500  1875
1     4    30     -     0     0     0     0   405   300
0    50    39     -     0     0     0     0  6500  1875
1     5    30     -     0     0     0     0   405   300
0    44    39     -     0     0     0     0  6500  1620
1     4    30     -     0     0     0     0   405   300
0    44    39     -     0     0     0     0  6500  1620
1     4    30     -     0     0     0     0   405   300
0    44    39     -     8     5     0     0  6500  1650
1     4    30     -     0     0     0     0   405   300

not able to understand what’s the issue?

Actually I am running a yolov4 training with COCO2017 dataset but cannot reproduce “gpu utils is always 0”.

Suggest you to monitor the “sm” for a long time.

More, please try to use KITTI dataset to train as well.

Hi,

I am already using dataset in kitti format for training.
as u can see it’s been around 15 min and still the training is in the 2nd epoch.

Epoch 1/50
2/320 […] - ETA: 6:17:21 - loss: 13.3049 /usr/local/lib/python3.6/dist-packages/keras/callbacks.py:122: UserWarning: Method on_batch_end() is slow compared to the batch update (10.254203). Check your callbacks.
% delta_t_median)
153/320 [=============>…] - ETA: 17:39 - loss: 13.6621