BodyPoseNet Training error

• Hardware : RTX3080
• Network Type : BodyPoseNet
• TLT Version : TAO 4.0

cuda : 11.2
cudnn : 8.1.0
tensorflow : 2.6.0
nvidia-docker : 2.11.0


Error Code

!tao bpnet train -e $SPECS_DIR/bpnet_train_m1_coco.yaml \
                 -r $USER_EXPERIMENT_DIR/models/exp_m1_unpruned \
                 -k $KEY \
                 --gpus $NUM_GPUS

I tried to train model.tlt with a sample coco data by referring to the quick start-BodyPoseNet, but the following error occurred.

It says toolchain error. I tried changing another versions of Nvidia driver, cuda, and cudnn but it didn’t work, so I leave a question. How do I solve this error?
There is no problem evaluating or inferring other images.
( My First Environment : Driver version=515.86, Cuda=11.7 )

It indicates a mismatch between driver and compilation toolchain, it seems your driver (460.x) is too old, please refer to the TAO4.0 requirement page, the nvidia-driver version should be > 520:
https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_quick_start_guide.html#software-requirements

Similar error log as CLI update - #11 by dbrazey .
Please update the driver and retry.

Uninstall current driver:
sudo apt purge nvidia-driver-515
sudo apt autoremove
sudo apt autoclean

Install new driver.
sudo apt install nvidia-driver-520

I appreciate your advice very much. I solved the problem with nvidia-driver-525. (520 got me black screen)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.