GPU Model Training Suddenly Creates CPU Bottleneck and Kills Training

katherine.kaylegianstarke · July 23, 2025, 12:00am

I’m training models using a Dell 5860 Precision tower with Nvidia RTX A5000 card. I’ve had occasional instances of Windows updates that affect my driver so that Tensorflow stops sees my GPU. A driver or Cudnn update fixes this. This has been happening more frequently. Currently, Tensorflow sees my GPU, loads images and creates models using shared memory, then my CPU ramps to 100% and kills the training about a minute into the first epoch. The use of shared memory appears consistent with what I witnessed when the system worked. I had a brief temp spike in the GPU about a month ago, and I haven’t seen it again. I’ve received KillKernel Windows errors 193 and 141 from Windows Reliability Tracker. I’m required to use Windows so Tensorflow, Cuda, Cudnn are installed through WSL while the driver is installed through Windows Native.

System Params:
Windows 11
WSL2
Python 3.12.2

Troubleshooting:

Updated drivers and reinstalled current TF version.
Tried Tensorflow 2.19, 2.18, 2.17, 2.16, 2.14 along with Cuda and Cudnn from this doc Build from source | TensorFlow. Nvidia-smi ensures the right driver is installed and nvcc -V ensures the Cuda version is correct. TF is installed with pip install tensorflow[and-cuda]==v.v.v
Tried installing Cuda and Cudnn following using sudo apt-get and through Windows Native using a custom install to ensure the driver packaged with Cuda isn’t installed, so the latest drive is actually being used.
Tried different drivers with each version of TF, Cuda, Cudnn. I’ve tried driver 573.42, 573.48, 576.52, 576.02
Removed my GPU, cleaned it, and reinstalled.
Reimaged the desktop in case the Windows KillKernel errors referred to hw or drivers other than my Nvidia card.

Other than using the warranty on my desktop I have no idea what to do. Thanks in advanced for any help.

Topic		Replies	Views
HELP! Despite GPU training CPU bottleneck kills process CUDA Programming and Performance cuda , tensorflow	0	10	July 23, 2025
Cuda suddenly stopped working after some Windows update? CUDA Setup and Installation cuda , tensorflow	0	569	May 24, 2022
GPU driver version and CUDA Toolkit version to run Tensorflow under Windows 10 CUDA Setup and Installation cuda , tensorflow , python , windows-driver	0	1281	February 8, 2023
Keras/Tensorflow Running slowly on Windows 10 CUDA Programming and Performance	8	1613	March 8, 2019
CAN NOT TRAIN a simple CNN by using tensorflow-gpu for RTX5070 on Ubuntu 25.04 with driver 570-open TensorRT cuda , python , cudnn	1	330	May 25, 2025
GPU Lost when using Tensorflow Training Linux	0	427	March 6, 2019
tensorflow 'killed' error on the TX1 Jetson TX1	3	1395	October 18, 2021
Tensorflow GPU - GPU detected but never used and computer crash on Windows 10 - RTX 2070 CUDA Setup and Installation	7	5108	November 8, 2022
Tensorflow freezes during training (Linux OS) CUDA Programming and Performance	1	1354	April 11, 2018
Hard shutdown problem on ubuntu16.04 with 2080Ti Frameworks tensorflow	3	562	September 23, 2019

GPU Model Training Suddenly Creates CPU Bottleneck and Kills Training

Related topics