Object Detection using TAO DetectNet_v2. Run TAO training stopped

Please provide the following information when requesting support.

• Hardware (GeForce 3080Ti)
• Network Type (Detectnet_v2)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(detectnet_v2_train_resnet18_kitti.txt)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

When I use detectnet_v2.ipynb to train data, I have a errr, Please help me.

detectnet_v2_train_resnet18_kitti.txt (5.4 KB)

log.txt (52.3 KB)

Please double check.
And also monitor the “$nvidia-smi” to check the GPU memory consumption.

When th docker stop, th nvidi-smi is:

Please double check.?
Could you tell me whic file to check?

I mean you can run again to double confirm.
Before running again, please run below in terminal.
$ nvidia-smi --query-gpu=memory.used,memory.total --format=csv -i 0 -l 1

I test it many times, but still have this error, the ‘nvidia-smi --query-gpu=memory.used,memory.total --format=csv -i 0 -l 1’ result is:

When I run tao training, it shows this error.

You can ignore. It is not an error.

Can you share more info?

  • What is the $NUM_GPUS
  • Can you share the log when try to run again with a new folder " -r $USER_EXPERIMENT_DIR/new_folder " ?

$NUM_GPUS=1
log.txt (52.7 KB)

Please open a terminal to debug as below.
$ tao detectnet_v2 run /bin/bash

then
# detectnet_v2 train xxx

The log is:
log.txt (45.9 KB)

Please use a new folder and share the log. Thanks a lot.

 -r tao-experiments/detectnet_v2/experiment_dir_unpruned_new

OK, this is the log, thank you very much
log.txt (49.6 KB)

but there is a question, a few days later, this tao training is success, but when I want to try again today, it has this error.

OK, thanks for the info. May I know that if you ever know below change and workaround?

This had solved my error, thank you very much.
But when I meet this similar problem, where can I find this answer?

You can create a topic in this forum. Thanks.

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.