• Hardware: Rtx 3080
• Tao Toolkit Version: 4.0.0
Try to use AutoML and follow the API service bare metal setup on Ubuntu 20.04 notebook
cmd:
bash setup.sh install
error:
TASK [Installing the GPU Operator on NVIDIA Cloud Native Core 6.1] ******************************************************************************************************************
fatal: [192.168.2.89]: FAILED! => {“changed”: true, “cmd”: “helm install --version 1.10.1 --values /tmp/values.yaml --create-namespace --namespace nvidia-gpu-operator --devel nvidia/gpu-operator --set driver.version=‘510.47.03’ --wait --generate-name”, “delta”: “0:05:04.570808”, “end”: “2023-02-23 14:54:00.709022”, “msg”: “non-zero return code”, “rc”: 1, “start”: “2023-02-23 14:48:56.138214”, “stderr”: “Error: INSTALLATION FAILED: timed out waiting for the condition”, “stderr_lines”: [“Error: INSTALLATION FAILED: timed out waiting for the condition”], “stdout”: “”, “stdout_lines”: }
I found this TASK in quickstart_api_bare_metal\cnc\cnc-x86-install.yaml
- name: Installing the GPU Operator on NVIDIA Cloud Native Core 6.1
when: “enable_mig == false and enable_vgpu == false and enable_rdma == false and enable_gds == false and enable_secure_boot == false and gpu_operator.rc == 1 and network_operator_valid.rc == 1 and ‘running’ in k8sup.stdout and cnc_version == 6.1”
shell: helm install --version 1.10.1 --values /tmp/values.yaml --create-namespace --namespace nvidia-gpu-operator --devel nvidia/gpu-operator --set driver.version=‘{{ gpu_driver_version }}’ --wait --generate-name