Error training detectnet_V2 with TAO

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
format_version: 2.0
toolkit_version: 3.22.05
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
Using TAO to train detectnet_V2 is stuck in running the train cell, after a long time there is no progress, and there is no other error message

Any idea?Thanks.

How about below?

  • When the issue happens, can you check cpu memory and gpu memory?
  • Is it 100% reproduced?
  • Could you share $nvidia-smi

In fact tao is running, but very slowly

Suggest you to update nvidia-driver to 510.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.