Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc)
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) 4.0.1-tf2.9.1
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
I want to run the AutoML experiments, for which I was installing the TAO REST API.
Current System: SSH into 10.103.48.174 server with Ubuntu 20.04.
I have the following questions: 1. Is the way of specifying the IP address, and details in the host file correct? 2. Why is it skipping the driver installation process? 3. How do I fix the final error for the TASK [Validate GPU Operator Version for Cloud Native Core 6.1]
Just to clarify, for the IP part, I am already on 10.103.48.174, inside a conda environment, and that’s the same machine I want as master, so I have entered its IP details, that’s a plausible way of doing this right?
I did follow this as well as used v4.0.2 for everything and I get stuck at the same place as that person actually.
TASK [Waiting for the Cluster to become available] takes forever to load, I’ve waited 15 minutes, tried it twice. Rebooting the system actually did push it past that place where I was stuck, but the NVIDIA drivers installation is still being skipped and 2 validations are failing fatally.