Training becomes very slow and got killed while using single class

Config file : spec.txt (2.2 KB)

TLT Version : docker_tag: v3.21.08-py3
Training Snapshot :

Hi,

I am trying to train Yolov4 on custom dataset using single class but the problem is training part becomes very slow from the first epoch itself and it got Killed in the middle of 2nd epoch. So can u please help me out from this issue. I am using one target_class_mapping { key: ‘phone’ value: ‘phone’} in the spec file.

I have also attached the spec file and terminal snapshot for your refercence.

Looking forward to see any solution from your side.

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Hi,

Why model is needed ?. I am just only trying to train Yolov4 custom dataset having single class. the problem is training part becomes very slow from the first epoch itself and it got Killed in the middle of 2nd epoch as shown in the above image

Hi,

Why model is needed ?. I am just only trying to train Yolov4 custom dataset having single class. the problem is training part becomes very slow from the first epoch itself and it got Killed in the middle of 2nd epoch as shown in the above image

Hi,

I am waiting for your response.

Hi,

This doesn’t look like TensorRT related. If you’re facing this issue with TLT(TAO) related, we recommend you to please move this post to TAO forum by editing the category (post heading).

Thank you.

ok thanks