Object Detection using TAO DetectNet_v2. Run TAO training stopped

huihui308 · June 21, 2022, 8:01am

Please provide the following information when requesting support.

• Hardware (GeForce 3080Ti)
• Network Type (Detectnet_v2)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(detectnet_v2_train_resnet18_kitti.txt)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

When I use detectnet_v2.ipynb to train data, I have a errr, Please help me.

detectnet_v2_train_resnet18_kitti.txt (5.4 KB)

log.txt (52.3 KB)

Morganh · June 21, 2022, 8:17am

Please double check.
And also monitor the “$nvidia-smi” to check the GPU memory consumption.

huihui308 · June 21, 2022, 8:37am

When th docker stop, th nvidi-smi is:

Please double check.?
Could you tell me whic file to check?

Morganh · June 21, 2022, 8:41am

I mean you can run again to double confirm.
Before running again, please run below in terminal.
$ nvidia-smi --query-gpu=memory.used,memory.total --format=csv -i 0 -l 1

huihui308 · June 21, 2022, 8:47am

I test it many times, but still have this error, the ‘nvidia-smi --query-gpu=memory.used,memory.total --format=csv -i 0 -l 1’ result is:

huihui308 · June 21, 2022, 9:18am

When I run tao training, it shows this error.

Morganh · June 21, 2022, 9:26am

You can ignore. It is not an error.

Can you share more info?

What is the $NUM_GPUS
Can you share the log when try to run again with a new folder " -r $USER_EXPERIMENT_DIR/new_folder " ?

huihui308 · June 21, 2022, 9:34am

$NUM_GPUS=1
log.txt (52.7 KB)

Morganh · June 21, 2022, 9:39am

Please open a terminal to debug as below.
$ tao detectnet_v2 run /bin/bash

then
# detectnet_v2 train xxx

huihui308 · June 21, 2022, 9:53am

The log is:
log.txt (45.9 KB)

Morganh · June 21, 2022, 10:14am

Please use a new folder and share the log. Thanks a lot.

 -r tao-experiments/detectnet_v2/experiment_dir_unpruned_new

huihui308 · June 21, 2022, 10:22am

OK, this is the log, thank you very much
log.txt (49.6 KB)

but there is a question, a few days later, this tao training is success, but when I want to try again today, it has this error.

Morganh · June 21, 2022, 10:37am

OK, thanks for the info. May I know that if you ever know below change and workaround?

huihui308 · June 22, 2022, 7:25am

This had solved my error, thank you very much.
But when I meet this similar problem, where can I find this answer?

Morganh · June 22, 2022, 10:25am

You can create a topic in this forum. Thanks.

yingliu · July 6, 2022, 6:43am

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks

system · July 20, 2022, 6:44am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error training detectnet_V2 with TAO TAO Toolkit	4	414	August 24, 2022
Error when using tao tool to train detectnet_v2 detection model TAO Toolkit	33	1220	February 5, 2022
Bug: TAO Toolkit can't run interactively TAO Toolkit tao	5	526	November 18, 2022
Error while running action recognition net TAO Toolkit	28	785	May 29, 2023
Installation Error TAO Toolkit	30	846	May 9, 2024
Facing error after training command TAO Toolkit	10	1084	February 28, 2022
Run TAO training probelm TAO Toolkit tao	30	434	May 21, 2024
Docker Instantiation Failed with Error when using Detectnet_v2 dataset_convert TAO Toolkit	7	439	May 23, 2022
Error in TAO-Toolkit while training TAO Toolkit	2	1112	January 4, 2022
Error while training detectnet v2 taotollkit on default notebook TAO Toolkit	2	308	March 9, 2024

Object Detection using TAO DetectNet_v2. Run TAO training stopped

Related topics