How to enable ECC on RTX A4000

Hi generix,

Thank you so much for your further help, and sorry for this late reply. First, I could turn on the ECC function. Below are the processes I did, provided only for debugging (and avoiding) this issue.

  1. I tried to disable the persistence mode accordingly using “nvidia-smi -pm 0.” This operation changed the rows of “Persistence Mode” in “nvidia-smi -q” to “Disabled”.
  2. Without a reboot, I ran “nvidia-smi -g 1 -e 1,” and the “Pending” ECC state changed to “Enabled.”
  3. After a reboot, “Persistence Mode” changed to “Enabled,” and the current and pending ECC states were “Disabled.”
  4. As I did not understand when the driver of version 530 was installed (I had installed 525 manually,) I deleted the driver version 530 and then installed version 525 (with “-server”; I did not find the driver without “-server” and “-open” today.)
  5. Just after installing the driver version 525, the persistence mode was “Disabled.” I turned on the ECC state with “nvidia-smi -g 1(&2) -e 1.”
  6. After a reboot, I confirmed the current and pending ECC states remained “Enabled.”
  7. I further installed CUDA 11.8. This installation changed the driver version to 530.
  8. After a reboot, the persistence mode was turned on. But ECC states remained to be “Enabled.” This could be confirmed both from nvidia-smi and nvidia-settings GUI.

Thank you again for your time and support.
Kai

2023年3月9日(木) 19:29 generix via NVIDIA Developer Forums <notifications@nvidia.discoursemail.com>: