GeForce RTX 2080ti freezes on deep learning

During deep learning training for basic mnist models, system freezes.

Config is as follows:

Mainboard: Gigabyte X470 Aorus Ultra Gaming AMD X470 So.AM4
CPU: AMD Ryzen 7 2700X 8×3,7 GHz boxed
GPU: 11GB Gainward GeForce RTX 2080 Ti Phoenix PCIe 3.0 x16
PSU: 750 Watt
RAM: 32GB G.Skill Aegis DDR4-3000 DIMM CL16 Dual Kit
SSD: 1000GB Crucial MX500 2.5″ (6.4cm) SATA 6Gb/s 3D-NAND
Case: Inter-Tech M-908

Any help would really be appreciated!

“Freezes” how exactly? What are the exact symptoms? “Freezes” how long into the training?

Based on the system specifications, the PSU wattage seems to be sufficient. What is the vendor and model of the PSU? Check the GPU auxiliary power cables: there should be two 8-poin power connectors, and the cables connecting them to the PSU should not use converters, Y-splitters, or daisy-chaining.

Use nvidia-smi to monitor GPU power and cooling. Remove overclocking of any overclocked components if possible. For example, the DRAM and GPU appear to be vendor-overclocked parts. Running the memory at DDR4-2666 speeds with CL=19 looks like worth a try.

The system components suggest a self-assembled “tweaked” gaming rig trying to get the highest performance from low-budget parts. Not always the best basis for a rock-solid compute platform.