Hi All,
• Hardware: Jetson AGX Orin devkit 64GB
• Network Type: Visual ChangeNet
• Jetson L4T 35.6.0
• Jetpack 5.1.4
We are trying out VisualChangeNet Segmentation for defect detection on Jetson AGX Orin. We are following: Transforming Industrial Defect Detection with NVIDIA TAO and Vision AI Models | NVIDIA Technical Blog and tao_tutorials/notebooks/tao_launcher_starter_kit/visual_changenet/visual_changenet_segmentation_MVTec.ipynb at main · NVIDIA/tao_tutorials · GitHub for building the model.
We have tried it on Jetson AGX Orin directly but encountered some issues with docker container as it is not supported on aarch64 platform and needs to be run on x86 platform.
Also from this: Can I try tao toolkit in AGX Orin?
Finally, when we tried training the tao model on X86 host machine running Ubuntu 22.04 without Nvidia GPU ( section 5.1 Train Visual ChangeNet-Segmentation model from tao_tutorials/notebooks/tao_launcher_starter_kit/visual_changenet/visual_changenet_segmentation_MVTec.ipynb at main · NVIDIA/tao_tutorials · GitHub) tao model visual_changenet train is failing with the below error:
tao model visual_changenet train
-e $SPECS_DIR/experiment.yaml
train.num_epochs=$NUM_EPOCHS
dataset.segment.root_dir=$DATA_DIR
model.backbone.pretrained_backbone_path=$BACKBONE_PATH
2025-01-09 15:05:14,335 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
2025-01-09 15:05:14,446 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt
2025-01-09 15:05:14,470 [TAO Toolkit] [WARNING] nvidia_tao_cli.components.docker_handler.docker_handler 288:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/test/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2025-01-09 15:05:14,470 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
Docker instantiation failed with error: 500 Server Error: Internal Server Error (“failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running prestart hook #0: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown”)
So my question is can we run the tao training on X86 Ubuntu 22.04 host machine without Nvidia discrete GPU’s and deploy it on Jetson AGX Orin?
Or how can we proceed with this without an Nvidia GPU or directly train it on Jetson AGX Orin?