BodyPoseNet Training error

hitesarang1 · January 20, 2023, 5:47am

• Hardware : RTX3080
• Network Type : BodyPoseNet
• TLT Version : TAO 4.0

cuda : 11.2
cudnn : 8.1.0
tensorflow : 2.6.0
nvidia-docker : 2.11.0

Error Code

!tao bpnet train -e $SPECS_DIR/bpnet_train_m1_coco.yaml \
                 -r $USER_EXPERIMENT_DIR/models/exp_m1_unpruned \
                 -k $KEY \
                 --gpus $NUM_GPUS

I tried to train model.tlt with a sample coco data by referring to the quick start-BodyPoseNet, but the following error occurred.

It says toolchain error. I tried changing another versions of Nvidia driver, cuda, and cudnn but it didn’t work, so I leave a question. How do I solve this error?
There is no problem evaluating or inferring other images.
( My First Environment : Driver version=515.86, Cuda=11.7 )

yingliu · January 20, 2023, 6:21am

It indicates a mismatch between driver and compilation toolchain, it seems your driver (460.x) is too old, please refer to the TAO4.0 requirement page, the nvidia-driver version should be > 520:
https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_quick_start_guide.html#software-requirements

Morganh · January 20, 2023, 9:23am

Similar error log as CLI update - #11 by dbrazey .
Please update the driver and retry.

Uninstall current driver:
sudo apt purge nvidia-driver-515
sudo apt autoremove
sudo apt autoclean

Install new driver.
sudo apt install nvidia-driver-520

hitesarang1 · January 25, 2023, 1:27am

I appreciate your advice very much. I solved the problem with nvidia-driver-525. (520 got me black screen)

system · February 8, 2023, 1:28am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Error while training classification model with TAO TAO Toolkit	2	471	September 9, 2022
TAO toolkit 4.0 actionrecognitionnet training error TAO Toolkit	5	381	August 18, 2023
Training Interrupted during loading pretrained weights TAO Toolkit tao	4	345	March 7, 2024
Not able to train on other systems TAO Toolkit	3	562	March 4, 2022
Tao toolkit fails to train LPRnet model TAO Toolkit	3	545	May 6, 2022
Bpnet model - Error while traning TAO Toolkit tensorflow , python	3	542	February 1, 2023
Error training detectnet_V2 with TAO TAO Toolkit	4	412	August 24, 2022
Error while running the command: "tao detectnet_v2 train" TAO Toolkit python , tao	3	658	February 23, 2023
Error training detectNet v2 with TAO TAO Toolkit ai-training , training , tao	5	819	April 3, 2023
LPRNet Error TAO Toolkit	13	217	June 19, 2024

BodyPoseNet Training error

Related topics