Tao 5.x is giving cuInit error

• Hardware: GeForce RTX 2060
• Network Type : yolo_v4
• TLT Version:

tao info
Configuration of the TAO Toolkit Instance
task_group: [‘model’, ‘dataset’, ‘deploy’]
format_version: 3.0
toolkit_version: 5.5.0
published_date: 08/26/2024

• Training spec file: No spec file yet, just running tao model yolo_v4 --help
• How to reproduce the issue ?

I started with fresh installation of ubuntu 22.04 LTS (jammy)

Installed nvidia driver using sudo apt install nvidia-driver-580 (after checking with nvidia-detector, later, I also tried with 550)

Rebooted the system. (system reboot was done after every major step)

nvidia-smi was working properly.

I followed the steps in tao 5.5.0 archive documentation here carefully.

Note: Whenever I run the setup/quick_launcher.sh, it always installs tao 6.x even if I git checkout tao 5.x release. As tao 6.x doesn’t support yolo models for training, I need tao 5.x or lower. I also tried using wget command that downloads tao 5.x zip file and running the .sh file from there, still it installed tao 6.x.

Eventhough I got tao 5.x using pip install mentioned later in the post, the following shows an nvidia-docker not found error while running the setup/quick_launcher.sh script which installed tao 6.x as shown below:

But I had nvidia-docker2, nvidia-container-toolkit, and nvidia-container-runtime installed using apt.


To get tao 5.x, after creating a new virtual environment, I ran,

pip install nvidia-pyindex

pip install nvidia-tao-5.5.1

Now, when I ran tao info the above information is printed,

Configuration of the TAO Toolkit Instance
task_group: [‘model’, ‘dataset’, ‘deploy’]
format_version: 3.0
toolkit_version: 5.5.0
published_date: 08/26/2024

But, when I run,

tao model yolo_v4 –help, it says the following:

I have seen some other forum posts of similar problem. In the solutions you provided there, I have come across dpkg -l | grep cuda, and also nvidia-fabric-manager.

Why is cuda not mentioned in the documentation of tao toolkit? Just asking.

I installed cuda using sudo apt install nvidia-cuda-toolkit and rebooted the system, but, the still the issue persisted.

Also, in the cuInit error message above, why is python 3.8 mentioned, it is inside the docker, right? Because, I am on ubuntu22 with python3.10.

Can you please help with the cuInit error?

Could you debug with the hint mentioned in No CUDA-capable device is detected - #8 by Morganh ? Thanks.

Thank you very much Morganh :)

This link to this post in your above answer solved the issue for me.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.