Hello @generix and @toamna2012. Some updates: I’ve been running my setup without problems or interruptions for the past three days. The only thing I’ve changed was my DRAM operation frequency: from 3600 MHz (maximum) to 3000 MHz. After I noticed my settings were stable I also used “sudo nvidia-smi -lgc 300,2115” to return to my old GPU frequency configurations and it’s still ok. Do you have any insights about why this is happening? Is there any way I can check for DRAM problems before asking the RMA for a motherboard replacement?
According to intel
https://ark.intel.com/content/www/de/de/ark/products/190887/intel-core-i9-9900kf-processor-16m-cache-up-to-5-00-ghz.html
Your cpu only suports 2666MHz memory clocks, so you have been heavily overclocking the memory.
why did you use nvidia-smi -lgc 300,2115 ?
"NVIDIA has paired 8 GB GDDR6 memory with the GeForce RTX 2080 SUPER, which are connected using a 256-bit memory interface. The GPU is operating at a frequency of 1650 MHz, which can be boosted up to 1815 MHz, memory is running at 1937 MHz. "
should be nvidia-smi -lgc 300,1815 ?
OMG, that’s crazy. I’m using “nvidia-smi -lgc 300,2115” because it’s what “PowerMizer” originally showed (figure attached). According to GeForce RTX 2080 SUPER’s specifications, the maximum I should expect is really something around 1815 MHz.
I have a second computer here using a ZOTAC GTX 1080 AMP!. According to the specifications, this GPU can be boosted up to 1822 MHz. You can see below a screenshot of what “PowerMizer” is showing there. When I bought this computer I just installed NVIDIA’s driver and let it be. Is it overclocked by default? Why is it reaching such high frequencies when it shouldn’t?
This phenomenon has been noticed before, especially with Turing cards, that the nvidia Linux driver doesn’t use the stock clocks but the vendor defined OC clocks. Depending on gpu temperature it shouldn’t actually reach those clocks, though.
That’s interesting. I noticed yesterday that I was reaching something near 2050 MHz. My GPU never reached temperatures higher than 45 °C. On the second computer (ZOTAC GTX 1080 AMP!) the current clock is 1949 MHz and the temperature is 51 °C, pretty low.
workaround:
set in BIOS:
suspend to RAM ->DISABLED;
Global C states Control → DISABLED
ACPI_CST C1 Declaration → DISABLED
PCIE Reset Control → DISABLED
set nvidia-smi pm 1, nvidia-smi lgc 1600,1605

