Tensorflow GPU - GPU detected but never used and computer crash on Windows 10 - RTX 2070

numael · July 11, 2019, 2:11pm

Hello,

I have an issue on my computer GL704G W - Win10 Pro - RTX 2070

Win10 Pro 64 bits
cuda_10.0.130_411.31
cudnn-10.0
python-3.6.8-amd64
vc_redist.x64
pip install tensorflow-gpu==1.10.0

I tested the TF GPU with:

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

returns:

WARNING: Logging before flag parsing goes to stderr.
W0711 16:04:51.333560 12692 deprecation_wrapper.py:119] From C:\Users\Manuel\PycharmProjects\testCUDA\cuda.py:2: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

W0711 16:04:51.334558 12692 deprecation_wrapper.py:119] From C:\Users\Manuel\PycharmProjects\testCUDA\cuda.py:2: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2019-07-11 16:04:51.368338: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-07-11 16:04:51.376758: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library nvcuda.dll
2019-07-11 16:04:52.693641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.44
pciBusID: 0000:01:00.0
2019-07-11 16:04:52.702086: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-07-11 16:04:52.708494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-11 16:04:53.357195: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-11 16:04:53.363858: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2019-07-11 16:04:53.367232: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2019-07-11 16:04:53.371312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6315 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5
2019-07-11 16:04:53.390774: I tensorflow/core/common_runtime/direct_session.cc:296] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5

Another test:

import tensorflow as tf
tf.test.is_built_with_cuda()

return:

True

import tensorflow as tf
tf.test.is_gpu_available(cuda_only=False, min_cuda_compute_capability=None)

return:

True

But, when I execute and Tensorflow GPU code, only the CPU works and after long time, my compute crash and reboot.

Exemples from: GitHub - tensorflow/models: Models and examples built with TensorFlow

What’s the problem ? What am I wrongly installed?

Best,
Manuel

numael · July 16, 2019, 3:22pm

Ok. I completely uninstall Python, CUDA 10 and the libs.

Now, I follow this How to Install TensorFlow with GPU Support on Windows 10 (Without Installing CUDA) UPDATED!

on my GL704G W - Win10 Pro - RTX 2070

So, I can execute keras/examples/deep_dream.py and I have this issue:

(tf-gpu) PS C:\Users\Manuel\demo\examples-keras> python .\deep_dream.py .\chien.png drm
Using TensorFlow backend.
WARNING:tensorflow:From C:\Users\Manuel\Anaconda3\envs\tf-gpu\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
2019-07-16 17:15:30.733448: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2019-07-16 17:15:32.158666: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.44
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.59GiB
2019-07-16 17:15:32.167946: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-07-16 17:15:32.683032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-16 17:15:32.687889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2019-07-16 17:15:32.690815: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2019-07-16 17:15:32.694737: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6319 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
Model loaded.
WARNING:tensorflow:Variable += will be deprecated. Use variable.assign_add if you want assignment to the variable value or 'x = x + y' if you want a new python Tensor object.
Processing image shape (255, 255)
..Loss value at 0 : 1.0290155
..Loss value at 1 : 0.9662107
..Loss value at 2 : 0.92942023
..Loss value at 3 : 0.92720234
..Loss value at 4 : 0.93417346
..Loss value at 5 : 0.93499184
..Loss value at 6 : 0.93060154
..Loss value at 7 : 0.93919677
..Loss value at 8 : 0.9740615
..Loss value at 9 : 0.9510974
..Loss value at 10 : 0.95132095
..Loss value at 11 : 0.924647
..Loss value at 12 : 0.9184734
..Loss value at 13 : 0.9356839
..Loss value at 14 : 0.94341934
..Loss value at 15 : 0.9621455
..Loss value at 16 : 0.93943894
..Loss value at 17 : 0.93967533
..Loss value at 18 : 0.9238926
..Loss value at 19 : 0.95155436
Processing image shape (357, 357)
2019-07-16 17:15:47.761198: E tensorflow/stream_executor/cuda/cuda_driver.cc:981] failed to synchronize the stop event: CUDA_ERROR_ILLEGAL_INSTRUCTION: an illegal instruction was encountered
2019-07-16 17:15:47.768278: E tensorflow/stream_executor/cuda/cuda_timer.cc:55] Internal: error destroying CUDA event in context 000001D46CD2E8C0: CUDA_ERROR_ILLEGAL_INSTRUCTION: an illegal instruction was encountered
2019-07-16 17:15:47.778170: E tensorflow/stream_executor/cuda/cuda_timer.cc:60] Internal: error destroying CUDA event in context 000001D46CD2E8C0: CUDA_ERROR_ILLEGAL_INSTRUCTION: an illegal instruction was encountered
2019-07-16 17:15:47.785700: F tensorflow/stream_executor/cuda/cuda_dnn.cc:194] Check failed: status == CUDNN_STATUS_SUCCESS (7 vs. 0)Failed to set cuDNN stream.

I found nothing about “Failed to set cuDNN stream.”

Someone can help me?

Robert_Crovella · July 16, 2019, 3:29pm

The problem is that some code that TF ran on the GPU is doing something illegal. It’s impossible to say what that is exactly based on the posting here. The cudnn failure is just a side-effect of that: once the illegal operation happens on the GPU, any further attempts to use the GPU will fail.

On windows its possible that you are hitting a WDDM TDR timeout on some TF kernel.

You might want to ask questions about TF on a TF support forum.

numael · July 22, 2019, 11:36am

Hello,

Thanks for your help. I installed Tensorflow GPU 1.14 with cuda 10

it works … sometimes it works and sometimes I have a computer crash.

numael · July 25, 2019, 8:09am

Hello, I just found a troubling fact. When I do not plug in the power of the laptop it runs more slowly and it is stable. This means that when the GPU is limited in power, it works.

If I plug the charger, TF model detection generates a lot of fake then freeze the computer up to WDDM TDR timeout.

I do not understand why computing power would change the behavior of Tensorflow.

bhabanimohapatra2 · September 13, 2019, 4:10am

@numael Did you issue got resolved ? I am having the same issue.

numael · September 13, 2019, 12:13pm

Hello. I do not have any certainty yet, however, it seems this is from a hardware problem. The GPU seems defective. Maybe a cooling problem …

leibnitz2012 · November 8, 2022, 10:02am

I just wonder why the gpu memory detected by tensorflow is about 6GB while it’s 8GB for RTX 2070 in fact ?

Topic		Replies	Views
Trouble Running GPU training with Tensorflow 2.0.0-alpha0 on python 3.7 in a WIndows 10 x64 system Frameworks (archived) tensorflow	3	1803	April 4, 2019
Check failed: status == CUDNN_STATUS_SUCCESS (7 vs. 0) Failed to set cuDNN stream. cuDNN	2	3218	December 4, 2019
Any Success with Nvidia 1080 ti setup in Tensorflow GPU + Keras in Windows???? CUDA Programming and Performance	1	3161	March 20, 2019
Tensorflow with RTX 2070 Super Frameworks (archived) tensorflow	14	9533	December 21, 2019
TensorFlow says CUDA not enabled CUDA Setup and Installation	3	8241	June 21, 2019
I'm using a RTX2080 and I'm trying to use tensorflow-gpu CUDA Setup and Installation	1	1252	November 13, 2018
Tensorflow does not recognize GPU (Windows 10, 1060) CUDA Setup and Installation	3	7224	April 30, 2018
Did TensorFlow caused GPU memory crash? CUDA Setup and Installation	5	5001	April 26, 2017
CUDA_10.0 unknown error CUDA Setup and Installation	1	1164	May 21, 2020
RTX 3070, Cuda and Tensorflow version Linux cuda	0	1332	December 5, 2020

Tensorflow GPU - GPU detected but never used and computer crash on Windows 10 - RTX 2070

Related topics