When I installed Tao-Toolkit-API, I run the command kubectl delete crd clusterpolicies.nvidia.com in the stage TASK [Waiting for the Cluster to become available]
However. I found that most of nvidia-gpu-operator was missed after I run this command.
-
Before running kubectl delete crd clusterpolicies.nvidia.com
-
After running kubectl delete crd clusterpolicies.nvidia.com
Besides, I found that there is an error about ClusterPolicy like the logs below:
1.6793996848505962e+09 INFO controller-runtime.metrics Metrics server is starting to listen {"addr": ":8080"}
1.6793996848525527e+09 INFO setup starting manager
1.6793996848537874e+09 INFO Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
1.6793996848538554e+09 INFO Starting server {"kind": "health probe", "addr": "[::]:8081"}
I0321 11:54:44.853941 1 leaderelection.go:248] attempting to acquire leader lease nvidia-gpu-operator/53822513.nvidia.com...
I0321 11:55:02.134623 1 leaderelection.go:258] successfully acquired lease nvidia-gpu-operator/53822513.nvidia.com
1.679399702134646e+09 DEBUG events Normal {"object": {"kind":"ConfigMap","namespace":"nvidia-gpu-operator","name":"53822513.nvidia.com","uid":"03372ca9-1fd1-44bc-99ea-8a98e1cf415c","apiVersion":"v1","resourceVersion":"1922"}, "reason": "LeaderElection", "message": "gpu-operator-7bfc5f55-wcmrf_8eec5cee-5770-491d-bfbc-29640144bd7e became leader"}
1.679399702134731e+09 DEBUG events Normal {"object": {"kind":"Lease","namespace":"nvidia-gpu-operator","name":"53822513.nvidia.com","uid":"095e5442-8470-445e-8c7f-b750964ac866","apiVersion":"coordination.k8s.io/v1","resourceVersion":"1923"}, "reason": "LeaderElection", "message": "gpu-operator-7bfc5f55-wcmrf_8eec5cee-5770-491d-bfbc-29640144bd7e became leader"}
1.679399702134777e+09 INFO controller.clusterpolicy-controller Starting EventSource {"source": "kind source: *v1.ClusterPolicy"}
1.6793997021348252e+09 INFO controller.clusterpolicy-controller Starting EventSource {"source": "kind source: *v1.Node"}
1.6793997021348305e+09 INFO controller.clusterpolicy-controller Starting EventSource {"source": "kind source: *v1.DaemonSet"}
1.6793997021348343e+09 INFO controller.clusterpolicy-controller Starting Controller
1.679399702235648e+09 INFO controllers.ClusterPolicy Reconciliate ClusterPolicies after node label update {"nb": 1}
1.679399702235739e+09 INFO controller.clusterpolicy-controller Starting workers {"worker count": 1}
1.6793997022375412e+09 INFO controllers.ClusterPolicy Operator metrics initialized.
1.6793997022376037e+09 INFO controllers.ClusterPolicy Getting assets from: {"path:": "/opt/gpu-operator/pre-requisites"}
1.6793997022377877e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "RuntimeClass", "in path:": "/opt/gpu-operator/pre-requisites"}
1.6793997022379386e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "PodSecurityPolicy", "in path:": "/opt/gpu-operator/pre-requisites"}
1.6793997022382555e+09 INFO controllers.ClusterPolicy Getting assets from: {"path:": "/opt/gpu-operator/state-operator-metrics"}
1.6793997022384405e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "Service", "in path:": "/opt/gpu-operator/state-operator-metrics"}
1.6793997022386987e+09 INFO controllers.ClusterPolicy Getting assets from: {"path:": "/opt/gpu-operator/state-driver"}
1.679399702238845e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "ServiceAccount", "in path:": "/opt/gpu-operator/state-driver"}
1.6793997022389421e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "Role", "in path:": "/opt/gpu-operator/state-driver"}
1.6793997022391186e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "ClusterRole", "in path:": "/opt/gpu-operator/state-driver"}
1.6793997022392702e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "RoleBinding", "in path:": "/opt/gpu-operator/state-driver"}
1.6793997022394092e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "ClusterRoleBinding", "in path:": "/opt/gpu-operator/state-driver"}
1.6793997022395024e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "DaemonSet", "in path:": "/opt/gpu-operator/state-driver"}
1.6793997022415798e+09 INFO controllers.ClusterPolicy Getting assets from: {"path:": "/opt/gpu-operator/state-container-toolkit"}
1.6793997022417176e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "ServiceAccount", "in path:": "/opt/gpu-operator/state-container-toolkit"}
1.6793997022417707e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "Role", "in path:": "/opt/gpu-operator/state-container-toolkit"}
1.6793997022418787e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "RoleBinding", "in path:": "/opt/gpu-operator/state-container-toolkit"}
1.6793997022419577e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "DaemonSet", "in path:": "/opt/gpu-operator/state-container-toolkit"}
1.6793997022423468e+09 INFO controllers.ClusterPolicy Getting assets from: {"path:": "/opt/gpu-operator/state-operator-validation"}
1.679399702242489e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "ServiceAccount", "in path:": "/opt/gpu-operator/state-operator-validation"}
1.6793997022425394e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "Role", "in path:": "/opt/gpu-operator/state-operator-validation"}
1.6793997022426913e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "ClusterRole", "in path:": "/opt/gpu-operator/state-operator-validation"}
1.679399702242786e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "RoleBinding", "in path:": "/opt/gpu-operator/state-operator-validation"}
1.6793997022428567e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "ClusterRoleBinding", "in path:": "/opt/gpu-operator/state-operator-validation"}
1.6793997022429276e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "DaemonSet", "in path:": "/opt/gpu-operator/state-operator-validation"}
1.6793997022446988e+09 INFO controllers.ClusterPolicy Getting assets from: {"path:": "/opt/gpu-operator/state-device-plugin"}
1.6793997022448952e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "ServiceAccount", "in path:": "/opt/gpu-operator/state-device-plugin"}
1.6793997022449656e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "Role", "in path:": "/opt/gpu-operator/state-device-plugin"}
1.6793997022450728e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "RoleBinding", "in path:": "/opt/gpu-operator/state-device-plugin"}
1.679399702245151e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "DaemonSet", "in path:": "/opt/gpu-operator/state-device-plugin"}
1.679399702245509e+09 INFO controllers.ClusterPolicy Getting assets from: {"path:": "/opt/gpu-operator/state-dcgm-exporter"}
1.6793997022456825e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "ServiceAccount", "in path:": "/opt/gpu-operator/state-dcgm-exporter"}
1.6793997022457643e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "Role", "in path:": "/opt/gpu-operator/state-dcgm-exporter"}
1.6793997022458937e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "RoleBinding", "in path:": "/opt/gpu-operator/state-dcgm-exporter"}
1.6793997022459788e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "Service", "in path:": "/opt/gpu-operator/state-dcgm-exporter"}
1.6793997022460837e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "DaemonSet", "in path:": "/opt/gpu-operator/state-dcgm-exporter"}
1.6793997022463932e+09 INFO controllers.ClusterPolicy Getting assets from: {"path:": "/opt/gpu-operator/gpu-feature-discovery"}
1.679399702246506e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "ServiceAccount", "in path:": "/opt/gpu-operator/gpu-feature-discovery"}
1.6793997022465627e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "Role", "in path:": "/opt/gpu-operator/gpu-feature-discovery"}
1.6793997022466557e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "RoleBinding", "in path:": "/opt/gpu-operator/gpu-feature-discovery"}
1.679399702246727e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "DaemonSet", "in path:": "/opt/gpu-operator/gpu-feature-discovery"}
1.679399702247027e+09 INFO controllers.ClusterPolicy Getting assets from: {"path:": "/opt/gpu-operator/state-mig-manager"}
1.6793997022472117e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "ServiceAccount", "in path:": "/opt/gpu-operator/state-mig-manager"}
1.6793997022472625e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "Role", "in path:": "/opt/gpu-operator/state-mig-manager"}
1.679399702247414e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "ClusterRole", "in path:": "/opt/gpu-operator/state-mig-manager"}
1.6793997022474875e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "RoleBinding", "in path:": "/opt/gpu-operator/state-mig-manager"}
1.6793997022475688e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "ClusterRoleBinding", "in path:": "/opt/gpu-operator/state-mig-manager"}
1.6793997022476468e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "ConfigMap", "in path:": "/opt/gpu-operator/state-mig-manager"}
1.6793997022478027e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "ConfigMap", "in path:": "/opt/gpu-operator/state-mig-manager"}
1.6793997022478821e+09 INFO controllers.ClusterPolicy DEBUG: Looking for {"Kind": "DaemonSet", "in path:": "/opt/gpu-operator/state-mig-manager"}
1.6793997022484095e+09 INFO controllers.ClusterPolicy Checking GPU state labels on the node {"NodeName": "admin-ops01"}
1.6793997022484212e+09 INFO controllers.ClusterPolicy - {"Label=": "nvidia.com/gpu.deploy.container-toolkit", " value=": "true"}
1.6793997022484248e+09 INFO controllers.ClusterPolicy - {"Label=": "nvidia.com/gpu.deploy.device-plugin", " value=": "true"}
1.6793997022484279e+09 INFO controllers.ClusterPolicy - {"Label=": "nvidia.com/gpu.deploy.dcgm", " value=": "true"}
1.6793997022484303e+09 INFO controllers.ClusterPolicy - {"Label=": "nvidia.com/gpu.deploy.dcgm-exporter", " value=": "true"}
1.679399702248433e+09 INFO controllers.ClusterPolicy - {"Label=": "nvidia.com/gpu.deploy.node-status-exporter", " value=": "true"}
1.6793997022484362e+09 INFO controllers.ClusterPolicy - {"Label=": "nvidia.com/gpu.deploy.operator-validator", " value=": "true"}
1.6793997022484434e+09 INFO controllers.ClusterPolicy - {"Label=": "nvidia.com/gpu.deploy.driver", " value=": "true"}
1.6793997022484462e+09 INFO controllers.ClusterPolicy - {"Label=": "nvidia.com/gpu.deploy.gpu-feature-discovery", " value=": "true"}
1.6793997022484522e+09 INFO controllers.ClusterPolicy Number of nodes with GPU label {"NodeCount": 1}
1.6793997022484884e+09 INFO controllers.ClusterPolicy Using container runtime: containerd
1.6793997022496712e+09 INFO KubeAPIWarningLogger node.k8s.io/v1beta1 RuntimeClass is deprecated in v1.22+, unavailable in v1.25+
1.6793997023495245e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"RuntimeClass": "nvidia"}
1.6793997023519964e+09 INFO controllers.ClusterPolicy INFO: ClusterPolicy step completed {"state:": "pre-requisites", "status": "ready"}
1.6793997024530985e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"Service": "gpu-operator", "Namespace": "nvidia-gpu-operator"}
1.6793997024557748e+09 INFO controllers.ClusterPolicy INFO: ClusterPolicy step completed {"state:": "state-operator-metrics", "status": "ready"}
1.6793997024580107e+09 INFO controllers.ClusterPolicy Found Resource, skipping update {"ServiceAccount": "nvidia-driver", "Namespace": "nvidia-gpu-operator"}
1.6793997024605198e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"Role": "nvidia-driver", "Namespace": "nvidia-gpu-operator"}
1.67939970246576e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"ClusterRole": "nvidia-driver", "Namespace": "nvidia-gpu-operator"}
1.6793997024696126e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"RoleBinding": "nvidia-driver", "Namespace": "nvidia-gpu-operator"}
1.6793997024730349e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"ClusterRoleBinding": "nvidia-driver", "Namespace": "nvidia-gpu-operator"}
1.6793997025756822e+09 INFO controllers.ClusterPolicy 5.4.0-77-generic {"Request.Namespace": "default", "Request.Name": "Node"}
1.6793997025763001e+09 INFO controllers.ClusterPolicy DaemonSet identical, skipping update {"DaemonSet": "nvidia-driver-daemonset", "Namespace": "nvidia-gpu-operator", "name": "nvidia-driver-daemonset"}
1.679399702576313e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"LabelSelector": "app=nvidia-driver-daemonset"}
1.6793997025763402e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"NumberOfDaemonSets": 1}
1.6793997025763452e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"NumberUnavailable": 1}
1.6793997025763497e+09 INFO controllers.ClusterPolicy INFO: ClusterPolicy step completed {"state:": "state-driver", "status": "notReady"}
1.679399702579127e+09 INFO controllers.ClusterPolicy Found Resource, skipping update {"ServiceAccount": "nvidia-container-toolkit", "Namespace": "nvidia-gpu-operator"}
1.6793997025814555e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"Role": "nvidia-container-toolkit", "Namespace": "nvidia-gpu-operator"}
1.6793997025849078e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"RoleBinding": "nvidia-container-toolkit", "Namespace": "nvidia-gpu-operator"}
1.6793997025871568e+09 INFO controllers.ClusterPolicy DaemonSet identical, skipping update {"DaemonSet": "nvidia-container-toolkit-daemonset", "Namespace": "nvidia-gpu-operator", "name": "nvidia-container-toolkit-daemonset"}
1.6793997025871735e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"LabelSelector": "app=nvidia-container-toolkit-daemonset"}
1.6793997025872092e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"NumberOfDaemonSets": 1}
1.6793997025872154e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"NumberUnavailable": 1}
1.679399702587219e+09 INFO controllers.ClusterPolicy INFO: ClusterPolicy step completed {"state:": "state-container-toolkit", "status": "notReady"}
1.679399702589382e+09 INFO controllers.ClusterPolicy Found Resource, skipping update {"ServiceAccount": "nvidia-operator-validator", "Namespace": "nvidia-gpu-operator"}
1.6793997025912428e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"Role": "nvidia-operator-validator", "Namespace": "nvidia-gpu-operator"}
1.6793997025946085e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"ClusterRole": "nvidia-operator-validator", "Namespace": "nvidia-gpu-operator"}
1.6793997026150193e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"RoleBinding": "nvidia-operator-validator", "Namespace": "nvidia-gpu-operator"}
1.6793997026281643e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"ClusterRoleBinding": "nvidia-operator-validator", "Namespace": "nvidia-gpu-operator"}
1.6793997026304219e+09 INFO controllers.ClusterPolicy DaemonSet identical, skipping update {"DaemonSet": "nvidia-operator-validator", "Namespace": "nvidia-gpu-operator", "name": "nvidia-operator-validator"}
1.6793997026304383e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"LabelSelector": "app=nvidia-operator-validator"}
1.6793997026304753e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"NumberOfDaemonSets": 1}
1.6793997026304796e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"NumberUnavailable": 1}
1.6793997026304848e+09 INFO controllers.ClusterPolicy INFO: ClusterPolicy step completed {"state:": "state-operator-validation", "status": "notReady"}
1.6793997026374981e+09 INFO controllers.ClusterPolicy Found Resource, skipping update {"ServiceAccount": "nvidia-device-plugin", "Namespace": "nvidia-gpu-operator"}
1.6793997026450212e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"Role": "nvidia-device-plugin", "Namespace": "nvidia-gpu-operator"}
1.6793997026487129e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"RoleBinding": "nvidia-device-plugin", "Namespace": "nvidia-gpu-operator"}
1.6793997026506011e+09 INFO controllers.ClusterPolicy DaemonSet identical, skipping update {"DaemonSet": "nvidia-device-plugin-daemonset", "Namespace": "nvidia-gpu-operator", "name": "nvidia-device-plugin-daemonset"}
1.6793997026506176e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"LabelSelector": "app=nvidia-device-plugin-daemonset"}
1.6793997026506524e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"NumberOfDaemonSets": 1}
1.6793997026506586e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"NumberUnavailable": 1}
1.6793997026506624e+09 INFO controllers.ClusterPolicy INFO: ClusterPolicy step completed {"state:": "state-device-plugin", "status": "notReady"}
1.6793997026528385e+09 INFO controllers.ClusterPolicy Found Resource, skipping update {"ServiceAccount": "nvidia-dcgm-exporter", "Namespace": "nvidia-gpu-operator"}
1.6793997026547954e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"Role": "nvidia-dcgm-exporter", "Namespace": "nvidia-gpu-operator"}
1.6793997026581383e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"RoleBinding": "nvidia-dcgm-exporter", "Namespace": "nvidia-gpu-operator"}
1.679399702659721e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"Service": "nvidia-dcgm-exporter", "Namespace": "nvidia-gpu-operator"}
1.679399702661916e+09 INFO controllers.ClusterPolicy DaemonSet identical, skipping update {"DaemonSet": "nvidia-dcgm-exporter", "Namespace": "nvidia-gpu-operator", "name": "nvidia-dcgm-exporter"}
1.679399702661933e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"LabelSelector": "app=nvidia-dcgm-exporter"}
1.6793997026619601e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"NumberOfDaemonSets": 1}
1.6793997026619678e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"NumberUnavailable": 1}
1.6793997026619718e+09 INFO controllers.ClusterPolicy INFO: ClusterPolicy step completed {"state:": "state-dcgm-exporter", "status": "notReady"}
1.6793997026638792e+09 INFO controllers.ClusterPolicy Found Resource, skipping update {"ServiceAccount": "nvidia-gpu-feature-discovery", "Namespace": "nvidia-gpu-operator"}
1.6793997026656861e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"Role": "nvidia-gpu-feature-discovery", "Namespace": "nvidia-gpu-operator"}
1.6793997026689951e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"RoleBinding": "nvidia-gpu-feature-discovery", "Namespace": "nvidia-gpu-operator"}
1.6793997026706855e+09 INFO controllers.ClusterPolicy DaemonSet identical, skipping update {"DaemonSet": "gpu-feature-discovery", "Namespace": "nvidia-gpu-operator", "name": "gpu-feature-discovery"}
1.6793997026707032e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"LabelSelector": "app=gpu-feature-discovery"}
1.6793997026707256e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"NumberOfDaemonSets": 1}
1.6793997026707299e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"NumberUnavailable": 1}
1.6793997026707332e+09 INFO controllers.ClusterPolicy INFO: ClusterPolicy step completed {"state:": "gpu-feature-discovery", "status": "notReady"}
1.6793997026725569e+09 INFO controllers.ClusterPolicy Found Resource, skipping update {"ServiceAccount": "nvidia-mig-manager", "Namespace": "nvidia-gpu-operator"}
1.6793997026743934e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"Role": "nvidia-mig-manager", "Namespace": "nvidia-gpu-operator"}
1.6793997026778245e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"ClusterRole": "nvidia-mig-manager", "Namespace": "nvidia-gpu-operator"}
1.6793997026810489e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"RoleBinding": "nvidia-mig-manager", "Namespace": "nvidia-gpu-operator"}
1.6793997026845417e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"ClusterRoleBinding": "nvidia-mig-manager", "Namespace": "nvidia-gpu-operator"}
1.6793997026880655e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"ConfigMap": "default-mig-parted-config", "Namespace": "nvidia-gpu-operator"}
1.6793997026913178e+09 INFO controllers.ClusterPolicy Found Resource, updating... {"ConfigMap": "default-gpu-clients", "Namespace": "nvidia-gpu-operator"}
1.6793997026933584e+09 INFO controllers.ClusterPolicy DaemonSet identical, skipping update {"DaemonSet": "nvidia-mig-manager", "Namespace": "nvidia-gpu-operator", "name": "nvidia-mig-manager"}
1.6793997026933737e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"LabelSelector": "app=nvidia-mig-manager"}
1.6793997026934001e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"NumberOfDaemonSets": 1}
1.6793997026934044e+09 INFO controllers.ClusterPolicy DEBUG: DaemonSet {"NumberUnavailable": 0}
1.679399702693413e+09 INFO controllers.ClusterPolicy INFO: ClusterPolicy step completed {"state:": "state-mig-manager", "status": "ready"}
1.6793997026934242e+09 INFO controllers.ClusterPolicy ClusterPolicy isn't ready {"states not ready": ["state-driver", "state-container-toolkit", "state-operator-validation", "state-device-plugin", "state-dcgm-exporter", "gpu-feature-discovery"]}
E0321 11:55:06.433653 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
W0321 11:55:07.274819 1 reflector.go:324] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
E0321 11:55:07.274843 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1.ClusterPolicy: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
W0321 11:55:10.026237 1 reflector.go:324] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
E0321 11:55:10.026261 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1.ClusterPolicy: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
W0321 11:55:15.394434 1 reflector.go:324] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
E0321 11:55:15.394467 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1.ClusterPolicy: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
W0321 11:55:24.354989 1 reflector.go:324] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
E0321 11:55:24.355010 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1.ClusterPolicy: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
W0321 11:55:43.379049 1 reflector.go:324] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
E0321 11:55:43.379074 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1.ClusterPolicy: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
W0321 11:56:10.261695 1 reflector.go:324] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
E0321 11:56:10.261721 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1.ClusterPolicy: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
W0321 11:56:44.050761 1 reflector.go:324] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
E0321 11:56:44.050784 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1.ClusterPolicy: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
W0321 11:57:39.880717 1 reflector.go:324] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
E0321 11:57:39.880741 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1.ClusterPolicy: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
W0321 11:58:34.520072 1 reflector.go:324] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
E0321 11:58:34.520094 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1.ClusterPolicy: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
W0321 11:59:05.309809 1 reflector.go:324] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
E0321 11:59:05.309831 1 reflector.go:138] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:250: Failed to watch *v1.ClusterPolicy: failed to list *v1.ClusterPolicy: the server could not find the requested resource (get clusterpolicies.nvidia.com)
How could I do to deal with the problem ?