TAO Toolkit on X86 Ubuntu 22.04 host machine without Nvidia discrete GPU

Hi All,

• Hardware: Jetson AGX Orin devkit 64GB
• Network Type: Visual ChangeNet
• Jetson L4T 35.6.0
• Jetpack 5.1.4

We are trying out VisualChangeNet Segmentation for defect detection on Jetson AGX Orin. We are following: Transforming Industrial Defect Detection with NVIDIA TAO and Vision AI Models | NVIDIA Technical Blog and tao_tutorials/notebooks/tao_launcher_starter_kit/visual_changenet/visual_changenet_segmentation_MVTec.ipynb at main · NVIDIA/tao_tutorials · GitHub for building the model.

We have tried it on Jetson AGX Orin directly but encountered some issues with docker container as it is not supported on aarch64 platform and needs to be run on x86 platform.

Also from this: Can I try tao toolkit in AGX Orin?

Finally, when we tried training the tao model on X86 host machine running Ubuntu 22.04 without Nvidia GPU ( section 5.1 Train Visual ChangeNet-Segmentation model from tao_tutorials/notebooks/tao_launcher_starter_kit/visual_changenet/visual_changenet_segmentation_MVTec.ipynb at main · NVIDIA/tao_tutorials · GitHub) tao model visual_changenet train is failing with the below error:

tao model visual_changenet train
-e $SPECS_DIR/experiment.yaml
train.num_epochs=$NUM_EPOCHS
dataset.segment.root_dir=$DATA_DIR
model.backbone.pretrained_backbone_path=$BACKBONE_PATH
2025-01-09 15:05:14,335 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
2025-01-09 15:05:14,446 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt
2025-01-09 15:05:14,470 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 288:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/test/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2025-01-09 15:05:14,470 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
Docker instantiation failed with error: 500 Server Error: Internal Server Error (“failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown”)

So my question is can we run the tao training on X86 Ubuntu 22.04 host machine without Nvidia discrete GPU’s and deploy it on Jetson AGX Orin?
Or how can we proceed with this without an Nvidia GPU or directly train it on Jetson AGX Orin?

It is expected to run TAO training with an Nvidia discrete GPU.

It is expected to run TAO training with an Nvidia discrete GPU. Then deploy it in discrete GPU or Jetson device.

It is not supported to train TAO on Jetson devices yet.

BTW, you can also use cloud machine to train. Please refer to TAO user guide.

Hi Morganh,

Thank you for the reply.

Can I use google colab to train the tao model visual_changenet segmentation model and deploy this onto Jetson AGX Orin?
How can I deploy this trained model with google colab? Is there any relevant document I can refer to?

If you run with google colab, unfortunately the visual_changenet network is not supported.

You can run training with other cloud machines mentioned in Running TAO in the Cloud - NVIDIA Docs.

After training, you can deploy in dgpu or Jetson devices.

I want to specifically run visual_changenet model. So, what is the cloud computing machine I can go for so that I can later deploy this on Jetson device?

You can refer to Running TAO in the Cloud - NVIDIA Docs. For example, use an AWS instance.

I am giving this a try on the AWS cloud.
In the note:
The Amazon EC2 P3 and G4 instances are optimized for the NVIDIA Volta/Turing GPUs.

But I cannot find either of the instances. Alternatively, G4dn and G4ad are available. Which GPU Instance is most suitable for visual_changenet?