• Hardware: GeForce RTX 2060
• Network Type : yolo_v4
• TLT Version:
tao info
Configuration of the TAO Toolkit Instance
task_group: [‘model’, ‘dataset’, ‘deploy’]
format_version: 3.0
toolkit_version: 5.5.0
published_date: 08/26/2024
• Training spec file: No spec file yet, just running tao model yolo_v4 --help
• How to reproduce the issue ?
I started with fresh installation of ubuntu 22.04 LTS (jammy)
Installed nvidia driver using sudo apt install nvidia-driver-580
(after checking with nvidia-detector
, later, I also tried with 550)
Rebooted the system. (system reboot was done after every major step)
nvidia-smi
was working properly.
I followed the steps in tao 5.5.0 archive documentation here carefully.
Note: Whenever I run the setup/quick_launcher.sh
, it always installs tao 6.x even if I git checkout tao 5.x release. As tao 6.x doesn’t support yolo models for training, I need tao 5.x or lower. I also tried using wget
command that downloads tao 5.x zip file and running the .sh
file from there, still it installed tao 6.x.
Eventhough I got tao 5.x using pip install mentioned later in the post, the following shows an nvidia-docker
not found error while running the setup/quick_launcher.sh
script which installed tao 6.x as shown below:
But I had nvidia-docker2
, nvidia-container-toolkit
, and nvidia-container-runtime
installed using apt.
To get tao 5.x, after creating a new virtual environment, I ran,
pip install nvidia-pyindex
pip install nvidia-tao-5.5.1
Now, when I ran tao info
the above information is printed,
Configuration of the TAO Toolkit Instance
task_group: [‘model’, ‘dataset’, ‘deploy’]
format_version: 3.0
toolkit_version: 5.5.0
published_date: 08/26/2024
But, when I run,
tao model yolo_v4 –help
, it says the following:
I have seen some other forum posts of similar problem. In the solutions you provided there, I have come across dpkg -l | grep cuda
, and also nvidia-fabric-manager
.
Why is cuda
not mentioned in the documentation of tao toolkit? Just asking.
I installed cuda using sudo apt install nvidia-cuda-toolkit
and rebooted the system, but, the still the issue persisted.
Also, in the cuInit error message above, why is python 3.8 mentioned, it is inside the docker, right? Because, I am on ubuntu22 with python3.10.
Can you please help with the cuInit error?