GPU Operator helm chat deployment issues

Hi,

I am trying to deploy the GPU operator but the deployment fails with pods unable to pull images. I even tried specific versions instead of latest and I am facing the same issue.

kubectl get pods -n gpu-operator

NAME                                                              READY   STATUS             RESTARTS        AGE
gpu-feature-discovery-wdv4c                                       0/1     Init:0/1           0               2m31s
gpu-operator-1761617510-node-feature-discovery-gc-595bfd845zwph   1/1     Running            1 (3m21s ago)   3d13h
gpu-operator-1761617510-node-feature-discovery-master-59d8shxxv   1/1     Running            1 (3m21s ago)   3d13h
gpu-operator-1761617510-node-feature-discovery-worker-rsh7v       1/1     Running            1 (3m21s ago)   3d13h
gpu-operator-5c5f48f846-h9w6c                                     1/1     Running            1 (3m21s ago)   3d13h
nvidia-container-toolkit-daemonset-2rsjb                          0/1     Init:0/1           0               2m31s
nvidia-dcgm-exporter-n2m99                                        0/1     Init:0/1           0               2m31s
nvidia-device-plugin-daemonset-dxjdm                              0/1     Init:0/1           0               2m31s
nvidia-driver-daemonset-zmst7                                     0/1     ImagePullBackOff   0               3d13h
nvidia-operator-validator-sk5lm                                   0/1     Init:0/4           0               2m31s
> kubectl describe pod nvidia-driver-daemonset-zmst7 -n gpu-operator

  Normal   Pulling         3m36s (x4 over 4m56s)   kubelet            Pulling image "nvcr.io/nvidia/driver:580.95.05-rocky8.10"
  Warning  Failed          3m36s (x4 over 4m56s)   kubelet            Failed to pull image "nvcr.io/nvidia/driver:580.95.05-rocky8.10": rpc error: code = NotFound desc = failed to pull and unpack image "nvcr.io/nvidia/driver:580.95.05-rocky8.10": failed to resolve reference "nvcr.io/nvidia/driver:580.95.05-rocky8.10": nvcr.io/nvidia/driver:580.95.05-rocky8.10: not found
  Warning  Failed          3m36s (x4 over 4m56s)   kubelet            Error: ErrImagePull
  Warning  Failed          3m25s (x4 over 4m32s)   kubelet            Error: ImagePullBackOff
  Normal   BackOff         24s (x17 over 4m32s)    kubelet            Back-off pulling image "nvcr.io/nvidia/driver:580.95.05-rocky8.10"

Here’s the helm command that I used

helm install --wait --generate-name     -n gpu-operator --create-namespace     nvidia/gpu-operator     --version=v25.10.0     --set driver.enabled=true     --set toolkit.enabled=true     --set devicePlugin.enabled=truekubectl -n rag create secret generic ngc-credentials   --from-literal=NGC_API_KEY='<license key redacted>'
helm install --wait --generate-name     -n gpu-operator --create-namespace     nvidia/gpu-operator     --version=latest     --set driver.enabled=true     --set toolkit.enabled=true     --set devicePlugin.enabled=truekubectl -n rag create secret generic ngc-credentials   --from-literal=NGC_API_KEY='<license key redacted>'

I realize that only RHEL, RH CORE OS and Ubuntu are supported. Are there any plans to support Rocky Linux? If not, what alternatives are suggested?

For anyone else who are in my boat where they need to install gpu-operator and NCR, I ended up installing the drivers and NCR directly on the OS using RHEL8 repo. I was able to deploy the gpu-operator using the option

helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set driver.enabled=false