Please provide the following information when requesting support.
• Hardware (T4/V100/Xavier/Nano/etc)
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) Classification TF2
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) 5.0.0
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)
I ran bash setup.sh install and am getting the following timeout error for TASK [Installing the GPU Operator on NVIDIA Cloud Native Core 6.1]
fatal: [127.0.1.1]: FAILED! => {“changed”: true, “cmd”: “helm install --version 23.3.2 --values /tmp/values.yaml --create-namespace --namespace nvidia-gpu-operator --devel nvidia/gpu-operator --set driver.enabled=False --set driver.version=‘535.54.03’ --wait --generate-name”, “delta”: “0:05:04.165468”, “end”: “2023-09-18 15:23:33.661303”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2023-09-18 15:18:29.495835”, “stderr”: “Error: INSTALLATION FAILED: timed out waiting for the condition”, “stderr_lines”: [“Error: INSTALLATION FAILED: timed out waiting for the condition”], “stdout”: “”, “stdout_lines”: }
In the past, I have been able to install and uninstall the toolkit api easily. But when I tried it this time after about a month’s time, I got this error
NOTE: I dont’ want to uninstall the existing NVIDIA driver as it is a problem to the other users’ on my system. To ensure that, when the following task appears, I enter N.
TASK [capture user intent to override driver] ******************************************************************************************************************************************************************************************************************************************************************************************************************************
[capture user intent to override driver]
One or more hosts has NVIDIA driver installed. Do you want to override it (y/n)?: n