Error training detectNet v2 with TAO

Daphna · March 14, 2023, 12:38pm

Hello,
I am trying to train detectNet with your jupyter notebook. I got the same error with both my custom data and the data I downloaded (from the links inside the notebook).
I ran docker login nvcr.io and logged in successfully, but then I ran the command

jupyter notebook --ip 0.0.0.0 --allow-root --port 8888

without docker pull or docker run…

error is printed below

• Hardware GeForce RTX 2080 Ti
• Network Type Detectnet_v2
• TLT Version format_version: 2.0
toolkit_version: 4.0.1
published_date: 03/06/2023
4.0.0-tf1.15.5: docker_registry: nvcr.io

• Training spec file
nanovel_detectnet_v2_train_resnet18_kitti.txt (3.7 KB)

• How to reproduce the issue ?

!tao detectnet_v2 train -e $SPECS_DIR/nanovel_detectnet_v2_train_resnet18_kitti.txt \
                        -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
                        -k $KEY \
                        -n resnet18_detector \
                        --gpus $NUM_GPUS

This is the error I got:

tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: the provided PTX was compiled with an unsupported toolchain.
Telemetry data couldn't be sent, but the command ran successfully.
[WARNING]: <urlopen error [Errno -2] Name or service not known>

File "<frozen iva.detectnet_v2.training.utilities>", line 143, in get_singular_monitored_session
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1104, in __init__

Thanks for your help

Morganh · March 15, 2023, 3:28am

Please update nvidia driver.

For example,
Uninstall current driver:
sudo apt purge nvidia-driver-510
sudo apt autoremove
sudo apt autoclean

Install new driver.
sudo apt install nvidia-driver-520

Daphna · March 15, 2023, 8:21am

Ok, Thanks, I’ll try. I’ll just point out that Yolov4 training worked for me… with the same driver and the same method…

yingliu · April 3, 2023, 9:29am

Hi daphna,
Do you still need support here? Or shall we close this topic?
Thank you.

Daphna · April 3, 2023, 9:47am

Hi @yingliu, you can close the topic
Thanks!

system · April 17, 2023, 9:48am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error while running the command: "tao detectnet_v2 train" TAO Toolkit python , tao	3	670	February 23, 2023
Object Detection using TAO DetectNet_v2. Run TAO training stopped TAO Toolkit python	16	735	July 6, 2022
Error training detectnet_V2 with TAO TAO Toolkit	4	439	August 24, 2022
Facing error after training command TAO Toolkit	10	1096	February 28, 2022
Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>) TAO Toolkit	6	1843	March 4, 2022
Training Failure for License Plate Detection Toturial TAO Toolkit training , tao	5	1104	October 13, 2021
I used detectnet_v2 to train resnet34 with [ERROR] tensorflow TAO Toolkit	2	235	January 23, 2024
No CUDA-capable device is detected on tao detectnet_v2 dataset convert TAO Toolkit pycuda , omniverse_extension	13	6206	January 4, 2022
Error when using tao tool to train detectnet_v2 detection model TAO Toolkit	33	1304	February 5, 2022
Error while training detectnet v2 taotollkit on default notebook TAO Toolkit	2	325	March 9, 2024

Error training detectNet v2 with TAO

Related topics