Baremetal install TAO5.0 error

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc) Classification TF2
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) 5.0.0
• Training spec file(If have, please share here)
• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

I ran bash setup.sh install and am getting the following timeout error for TASK [Installing the GPU Operator on NVIDIA Cloud Native Core 6.1]
fatal: [127.0.1.1]: FAILED! => {“changed”: true, “cmd”: “helm install --version 23.3.2 --values /tmp/values.yaml --create-namespace --namespace nvidia-gpu-operator --devel nvidia/gpu-operator --set driver.enabled=False --set driver.version=‘535.54.03’ --wait --generate-name”, “delta”: “0:05:04.165468”, “end”: “2023-09-18 15:23:33.661303”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2023-09-18 15:18:29.495835”, “stderr”: “Error: INSTALLATION FAILED: timed out waiting for the condition”, “stderr_lines”: [“Error: INSTALLATION FAILED: timed out waiting for the condition”], “stdout”: “”, “stdout_lines”: }

In the past, I have been able to install and uninstall the toolkit api easily. But when I tried it this time after about a month’s time, I got this error

NOTE: I dont’ want to uninstall the existing NVIDIA driver as it is a problem to the other users’ on my system. To ensure that, when the following task appears, I enter N.
TASK [capture user intent to override driver] ******************************************************************************************************************************************************************************************************************************************************************************************************************************

[capture user intent to override driver]

One or more hosts has NVIDIA driver installed. Do you want to override it (y/n)?: n

How did you set gpu-operator-values.yml ?

Suggest you to set as below. The driver version is set to 525.85.12. And install_driver: false.

enable_mig: no
mig_profile: all-disabled
mig_strategy: single
nvidia_driver_version: "525.85.12"
install_driver: false #true

And also enter N when you run into above-mentioned One or more hosts has NVIDIA driver installed. Do you want to override it (y/n)?:.

That’s exactly how it was setup before, then I realized the drivers have been updated from 525.85.12 to 535.54.03 but with both I’m getting the same error

Can you share the full log when you run $bash setup.sh install and $nvidia-smi ?
Thanks.