TLT Version: docker_tag: v3.21.08-py3
Network Type : Yolov_4
Hi,
I am trying to train yolov4 using custom dataset in which i have only one class.
but training process running very slow and got killed in the 2nd epoch. i am unable to understand why this issue is happening even i am using command with gpu.
I have also attached the screenshot and spec file for your reference.
I tried but what about the training speed because it is very low for each epoch it takes several minutes to complete.
I tried training command with gpu but when the training part is running i checked that the gpu % was still 0%. as shown below
Epoch 1/50
2/160 […] - ETA: 2:38:12 - loss: 13.5692/usr/local/lib/python3.6/dist-packages/keras/callbacks.py:122: UserWarning: Method on_batch_end() is slow compared to the batch update (7.225892). Check your callbacks.
% delta_t_median)
160/160 [==============================] - 1144s 7s/step - loss: 13.3983
8ba2ee0d2707:60:96 [0] NCCL INFO Bootstrap : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.4<0>
8ba2ee0d2707:60:96 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
8ba2ee0d2707:60:96 [0] NCCL INFO NET/IB : No device found.
8ba2ee0d2707:60:96 [0] NCCL INFO NET/Socket : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.4<0>
8ba2ee0d2707:60:96 [0] NCCL INFO Using network Socket
NCCL version 2.7.8+cuda11.1
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 00/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 01/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 02/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 03/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 04/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 05/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 06/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 07/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 08/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 09/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 10/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 11/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 12/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 13/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 14/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 15/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 16/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 17/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 18/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 19/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 20/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 21/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 22/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 23/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 24/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 25/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 26/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 27/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 28/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 29/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 30/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Channel 31/32 : 0
8ba2ee0d2707:60:96 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [1] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [2] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [3] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [4] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [5] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [6] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [7] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [8] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [9] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [10] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [11] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [12] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [13] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [14] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [15] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [16] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [17] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [18] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [19] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [20] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [21] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [22] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [23] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [24] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [25] -1/-1/-1->0->-1|-1->0->-1/-1/-1 [26] -1/-1/-1->0->-1|-1->0->-1/-
8ba2ee0d2707:60:96 [0] NCCL INFO Setting affinity for GPU 0 to ffff
8ba2ee0d2707:60:96 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer
8ba2ee0d2707:60:96 [0] NCCL INFO comm 0x7fb50af951a0 rank 0 nranks 1 cudaDev 0 busId 17000 - Init COMPLETE
Epoch 2/50
142/160 [=========================>…] - ETA: 2:08 - loss: 13.0120
±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 5000 Off | 00000000:17:00.0 Off | Off |
| 33% 38C P2 44W / 230W | 5118MiB / 16125MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 Quadro RTX 5000 Off | 00000000:B3:00.0 Off | Off |
| 33% 30C P8 3W / 230W | 20MiB / 16122MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
As I am using custom dataset in which I have only one class “phone”. But when I tried same training command with another dataset in which I have also one class for that new dataset gpu is running and training speed is also increasing.
Don’t know what’s the issue is can you please help me out from this training speed issue.
No, only 1 difference is the total no of images. in the dataset i am facing training speed issue having 1600 images with there annotations in kitti format and in the latest dataset i have images around 300. Both dataset having the single class ‘phone’.
for my understanding that 1600 images dataset training is running on cpu instead of gpu. if I am correct then what exactly I need to do to resolve this training speed issue, also if I am wrong then what need to do?
Mon Jan 10 14:37:18 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 5000 Off | 00000000:17:00.0 Off | Off |
| 33% 37C P8 8W / 230W | 1MiB / 16125MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 Quadro RTX 5000 Off | 00000000:B3:00.0 Off | Off |
| 33% 35C P8 4W / 230W | 20MiB / 16122MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 1 N/A N/A 1335 G /usr/lib/xorg/Xorg 9MiB |
| 1 N/A N/A 1414 G /usr/bin/gnome-shell 6MiB |
±----------------------------------------------------------------------------+
After running training command :-
now using Nvidia-smi
Mon Jan 10 14:42:26 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 5000 Off | 00000000:17:00.0 Off | Off |
| 33% 38C P8 10W / 230W | 1366MiB / 16125MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 Quadro RTX 5000 Off | 00000000:B3:00.0 Off | Off |
| 33% 32C P8 3W / 230W | 20MiB / 16122MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 11699 C /usr/bin/python3.6 115MiB |
| 0 N/A N/A 11753 C python3.6 1247MiB |
| 1 N/A N/A 1335 G /usr/lib/xorg/Xorg 9MiB |
| 1 N/A N/A 1414 G /usr/bin/gnome-shell 6MiB |
±----------------------------------------------------------------------------+
coz for this dataset training having 1600 images, gpu % is still 0 and training speed is also slow but for other datasets gpu % is increasing while training and training speed also good.