Activations and Gradients blow up on one of RTX2080Ti

stepan.ulyanin · February 7, 2019, 3:53pm

Hi, we have a problem with one of our GTX 2080Ti that we use for deep learning. We have 2 of the aforementioned GPUs and one of them works just fine, however, the other one, explodes the activations of the convolutional layers to inf, resulting nans in the losses and gradients.

Here is the PyTorch forums thread, where we are trying to figure out the problem (other deep learning practitioners have the same issues): Different Losses on 2 different machines - autograd - PyTorch Forums

cuDNN: v7.4.2
CUDA: 10.0.130
PyTorch: 1.0.1
GPU: RTX 2080Ti

Any help would be appreciated, thank you

Topic		Replies	Views
Huge loss on 2080 Ti Deep Learning (Training & Inference)	0	526	January 6, 2019
Huge loss on RTX 2080 Ti issue GPU - Hardware	2	995	April 9, 2020
RTX 2080 TI Supported? TensorRT	10	3790	September 17, 2019
RTX 2080 cards crashed when training longer a PyTorch model Linux	4	1214	November 6, 2019
Detection training (resnet18) working on Tesla V100 GPU, but not on RTX 2080 Ti TAO Toolkit	1	805	August 29, 2019
cuDNN tensorflow/core/kernels/gpu_utils.cc:85 Detected cudnn out-of-bounds write in convolution buffer CUDA Developer Tools	1	1527	July 27, 2020
Tensorflow problem in rtx 4050 cuDNN	1	779	December 20, 2023
cudnn status execution failed error 2080ti cuDNN	2	1005	June 19, 2019
RTX 2070 CUDA problem? Cannot run pytorch anymore after a program crashed. The Mandelbrot sample shows artifacts. Linux	5	2719	April 28, 2019
cuDNN crashes ever since an error during training cuDNN	6	6369	July 28, 2020

Activations and Gradients blow up on one of RTX2080Ti

Related topics