Hello,
I was trying to follow the TAO Toolkit Quick Start Guide and want to train a yolo-v4-tiny network.
I’ve opened Jupyter notebook and I’ve finished the re-training procedure of the network, everything was fine yet when I’ve runned the Evaluate retrained model section :
!tao yolo_v4_tiny evaluate -e $SPECS_DIR/yolo_v4_tinu_retrain_kitti.txt
-m $USER_EXPERIMENT_DIR/experiment_dir_retrain/weights/yolov4_cspdarknet_tiny_epoch_$EPOCH.tlt
-k $KEY
I’ve received the following output :
2023-03-07 10:05:42, 107 [INFO] root: Registry: [‘nvcr.io’]
2023-03-07 10:05:42, 218 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5
2023-03-07 10:05:42, 317 [WARNING] tlt.components.docker_handler.docker_handler: Docker will run the commands as root. If you would like to retain your local host permissions, please add the “user”: “UID:GID” in the DockerOptions portion of the “/home/user/.tao_mounts.json” file. You can obtain your users UID and GID by using the “id -u” and “id -g” commands on the terminal.
Docker instantiation failed with error: 500 Server Error: Internal Server Error (“failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown”)
Thank you for your help!