Enable ECC on RTX 4090 on Ubuntu 22.04 LTS

Trying to enable ECC on an RTX 4090 running on Ubuntu 22.04.

  • Can not turn ECC on using the nvidia-settings GUI
  • Toggling ECC via the CLI sudo nvidia-smi -e=1 returns the response that a reboot is required. After reboot, nvidia-settings doesn’t show the card, but the display is working. A shutdown & restart (cold boot) returns the card to be “seen” with the nvidia-settings GUI, but ECC isn’t enabled.

$ nvidia-smi -q

==============NVSMI LOG==============

Timestamp : Thu Aug 10 15:36:21 2023
Driver Version : 535.86.10
CUDA Version : 12.2

Attached GPUs : 1
GPU 00000000:01:00.0
Product Name : NVIDIA GeForce RTX 4090
Product Brand : GeForce
Product Architecture : Ada Lovelace
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Enabled
Addressing Mode : None
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-9872cbec-3e30-d2d9-5e19-25fcda8b67f9
Minor Number : 0
VBIOS Version : 95.02.3C.00.4E
MultiGPU Board : No
Board ID : 0x100
Board Part Number : N/A
GPU Part Number : 2684-300-A1
FRU Part Number : N/A
Module ID : 1
Inforom Version
Image Version : G002.0000.00.03
OEM Object : 2.0
ECC Object : 6.16
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x01
Device : 0x00
Domain : 0x0000
Device Id : 0x268410DE
Bus Id : 00000000:01:00.0
Sub System Id : 0x13B3196E
GPU Link Info
PCIe Generation
Max : 4
Current : 1
Device Current : 1
Device Max : 4
Host Max : 4
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 1000 KB/s
Rx Throughput : 0 KB/s
Atomic Caps Inbound : N/A
Atomic Caps Outbound : N/A
Fan Speed : 0 %
Performance State : P8
Clocks Event Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 24564 MiB
Reserved : 355 MiB
Used : 747 MiB
Free : 23460 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 17 MiB
Free : 239 MiB
Conf Compute Protected Memory Usage
Total : 0 MiB
Used : 0 MiB
Free : 0 MiB
Compute Mode : Default
Utilization
Gpu : 7 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
JPEG : 0 %
OFA : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
ECC Mode
Current : Disabled
Pending : Disabled
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows
Correctable Error : 0
Uncorrectable Error : 0
Pending : No
Remapping Failure Occurred : No
Bank Remap Availability Histogram
Max : 192 bank(s)
High : 0 bank(s)
Partial : 0 bank(s)
Low : 0 bank(s)
None : 0 bank(s)
Temperature
GPU Current Temp : 47 C
GPU T.Limit Temp : 36 C
GPU Shutdown T.Limit Temp : -7 C
GPU Slowdown T.Limit Temp : -2 C
GPU Max Operating T.Limit Temp : 0 C
GPU Target Temperature : 84 C
Memory Current Temp : N/A
Memory Max Operating T.Limit Temp : N/A
GPU Power Readings
Power Draw : 13.89 W
Current Power Limit : 450.00 W
Requested Power Limit : 450.00 W
Default Power Limit : 450.00 W
Min Power Limit : 150.00 W
Max Power Limit : 450.00 W
Module Power Readings
Power Draw : N/A
Current Power Limit : N/A
Requested Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 210 MHz
SM : 210 MHz
Memory : 405 MHz
Video : 1185 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Deferred Clocks
Memory : N/A
Max Clocks
Graphics : 3105 MHz
SM : 3105 MHz
Memory : 10501 MHz
Video : 2415 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : 875.000 mV
Fabric
State : N/A
Status : N/A
Processes
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 2680
Type : G
Name : /usr/lib/xorg/Xorg
Used GPU Memory : 94 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 2847
Type : C+G
Name : /usr/libexec/gnome-remote-desktop-daemon
Used GPU Memory : 390 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 2884
Type : G
Name : /usr/bin/gnome-shell
Used GPU Memory : 69 MiB
GPU instance ID : N/A
Compute instance ID : N/A
Process ID : 4488
Type : G
Name : /snap/firefox/2987/usr/lib/firefox/firefox
Used GPU Memory : 164 MiB

1 Like

You can use nvidia-settings,it works.

1 Like

Hi @andorjkiss, Are you able to disable ECC? I am facing the same problem.

Yes, I was able to enable and disable ECC RAM on the GPU after the next driver update - guess there was a bug in the initial driver that required sudo permissions for that option. Using the lastest nVIDIA Linux driver currently (545.x).

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.