Fatal Exception: The type initializer for 'Nvidia.Clara.Platform.Metrics.GpuMetrics' threw an exception

Hello!

After runnning clara platform, only clara-node-monitor POD’s status is “CrashLoopBackOff”. Others are “Running”.

So I check the log of clara-node-monitor POD, then I got below Error.

Error: Failed to initialize NVML
Fatal Exception: The type initializer for ‘Nvidia.Clara.Platform.Metrics.GpuMetrics’ threw an exception.

Please teach me how I solve this.

Thank you.

Hey yasu18, appreciate your interest in the Deploy platform, and welcome to the community!

The error you encountered looks like an issue with the container runtime accessing the GPU. A couple quesitons:

  • Which version of Clara Deploy are you using? and,
  • How did you install the dependencies (bootstrap.sh vs. ansible)?

The current (and recommended) Clara Deploy release is r8.1 which uses ansible to configure the host system including the driver, docker, container runtime, and K8s device plugin. Things can get complicated if you’re installing over preexisting versions of any of these, but happy to help debug.

Thanks,
Kris

Hi Kris, thank you for your reply.

*Which version of Clara Deploy are you using?

→ A. I’m using ver0.8.1-dc43866

  • How did you install the dependencies (bootstrap.sh vs. ansible)?

→ A. I haven’t used either for the installation. I did it via “apt” or “snap” .
So, I tried reinstallation via “ansible”, and I got error on “NVIDIA Driver Install” step (ansible-playbook -K driver.yml).
It might be caused by my system.

This is an error I got, just in case.

fatal: [localhost]: FAILED! => {“cache_update_time”: 1633671124, “cache_updated”: false, “changed”: false, “msg”: “’/usr/bin/apt-get -y -o “Dpkg::Options::=–force-confdef” -o “Dpkg::Options::=–force-confold” install ‘nvidia-driver-460’’ failed: E: Unable to correct problems, you have held broken packages.\n”, “rc”: 100, “stderr”: “E: Unable to correct problems, you have held broken packages.\n”, “stderr_lines”: [“E: Unable to correct problems, you have held broken packages.”], “stdout”: “Reading package lists…\nBuilding dependency tree…\nReading state information…\nSome packages could not be installed. This may mean that you have\nrequested an impossible situation or if you are using the unstable\ndistribution that some required packages have not yet been created\nor been moved out of Incoming.\nThe following information may help to resolve the situation:\n\nThe following packages have unmet dependencies:\n nvidia-driver-460 : Depends: libnvidia-gl-460 (= 460.91.03-0ubuntu1) but it is not going to be installed\n Depends: nvidia-dkms-460 (= 460.91.03-0ubuntu1)\n Depends: nvidia-kernel-source-460 (= 460.91.03-0ubuntu1) but it is not going to be installed\n Depends: libnvidia-extra-460 (= 460.91.03-0ubuntu1) but it is not going to be installed\n Depends: nvidia-compute-utils-460 (= 460.91.03-0ubuntu1) but it is not going to be installed\n Depends: libnvidia-decode-460 (= 460.91.03-0ubuntu1) but it is not going to be installed\n Depends: libnvidia-encode-460 (= 460.91.03-0ubuntu1) but it is not going to be installed\n Depends: xserver-xorg-video-nvidia-460 (= 460.91.03-0ubuntu1) but it is not going to be installed\n Depends: libnvidia-cfg1-460 (= 460.91.03-0ubuntu1) but it is not going to be installed\n Depends: libnvidia-ifr1-460 (= 460.91.03-0ubuntu1) but it is not going to be installed\n Recommends: libnvidia-compute-460:i386 (= 460.91.03-0ubuntu1)\n Recommends: libnvidia-decode-460:i386 (= 460.91.03-0ubuntu1)\n Recommends: libnvidia-encode-460:i386 (= 460.91.03-0ubuntu1)\n Recommends: libnvidia-ifr1-460:i386 (= 460.91.03-0ubuntu1)\n Recommends: libnvidia-fbc1-460:i386 (= 460.91.03-0ubuntu1)\n Recommends: libnvidia-gl-460:i386 (= 460.91.03-0ubuntu1)\n”, “stdout_lines”: [“Reading package lists…”, “Building dependency tree…”, “Reading state information…”, “Some packages could not be installed. This may mean that you have”, “requested an impossible situation or if you are using the unstable”, “distribution that some required packages have not yet been created”, “or been moved out of Incoming.”, “The following information may help to resolve the situation:”, “”, “The following packages have unmet dependencies:”, " nvidia-driver-460 : Depends: libnvidia-gl-460 (= 460.91.03-0ubuntu1) but it is not going to be installed", " Depends: nvidia-dkms-460 (= 460.91.03-0ubuntu1)", " Depends: nvidia-kernel-source-460 (= 460.91.03-0ubuntu1) but it is not going to be installed", " Depends: libnvidia-extra-460 (= 460.91.03-0ubuntu1) but it is not going to be installed", " Depends: nvidia-compute-utils-460 (= 460.91.03-0ubuntu1) but it is not going to be installed", " Depends: libnvidia-decode-460 (= 460.91.03-0ubuntu1) but it is not going to be installed", " Depends: libnvidia-encode-460 (= 460.91.03-0ubuntu1) but it is not going to be installed", " Depends: xserver-xorg-video-nvidia-460 (= 460.91.03-0ubuntu1) but it is not going to be installed", " Depends: libnvidia-cfg1-460 (= 460.91.03-0ubuntu1) but it is not going to be installed", " Depends: libnvidia-ifr1-460 (= 460.91.03-0ubuntu1) but it is not going to be installed", " Recommends: libnvidia-compute-460:i386 (= 460.91.03-0ubuntu1)", " Recommends: libnvidia-decode-460:i386 (= 460.91.03-0ubuntu1)", " Recommends: libnvidia-encode-460:i386 (= 460.91.03-0ubuntu1)", " Recommends: libnvidia-ifr1-460:i386 (= 460.91.03-0ubuntu1)", " Recommends: libnvidia-fbc1-460:i386 (= 460.91.03-0ubuntu1)", " Recommends: libnvidia-gl-460:i386 (= 460.91.03-0ubuntu1)"]}

Hi Kris,

I completely tried to do the way of your recommendation, and I could done it without error.
Finally, all of PODs could be “Running” status.

I really appreciate you.

Thank you,
Yasu