Error training detectnet_V2 with TAO

Using TAO to train detectnet_V2 is stuck in running the train cell, after a long time there is no progress, and there is no other error message

Any idea?Thanks.

How about below?

  • When the issue happens, can you check cpu memory and gpu memory?
  • Is it 100% reproduced?
  • Could you share $nvidia-smi

In fact tao is running, but very slowly

Suggest you to update nvidia-driver to 510.

