Getting “CUDA_ERROR_INVALID_VALUE: invalid argument” in python with Tensorflow 1.14

Some Information
python: 3.6.9
tensorflow-gpu==1.14.0
protobuf==3.11.3
tensorflow-estimator==1.14.0

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

$ nvidia-smi
Thu Apr 23 13:22:06 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:B3:00.0 Off |                  N/A |
| 26%   28C    P8    12W / 250W |    119MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1277      G   /usr/lib/xorg/Xorg                            39MiB |
|    0      1388      G   /usr/bin/gnome-shell                          77MiB |
+-----------------------------------------------------------------------------+

When I run the snippet below, as python test.py

import os
# Enable '0' or disable '-1' GPU use
 os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ['CUDA_VISIBLE_DEVICES'] = "0"
import warnings

with warnings.catch_warnings():
	warnings.filterwarnings("ignore", category=FutureWarning)
	import tensorflow as tf
	config = tf.compat.v1.ConfigProto()
	# config.gpu_options.visible_device_list = "0"  # pylint: disable=no-member
	config.gpu_options.allow_growth = True  # pylint: disable=no-member
	session = tf.compat.v1.Session(config=config)

# check if successfully using GPU
if tf.test.gpu_device_name():
	print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))
else:
	print('GPU not being used')

I get the following error

2020-04-23 13:13:15.969352: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-04-23 13:13:15.974088: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-04-23 13:13:15.990122: W tensorflow/compiler/xla/service/platform_util.cc:256] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_INVALID_VALUE: invalid argument
2020-04-23 13:13:15.990240: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: no supported devices found for platform CUDA
Aborted (core dumped)

When I set os.environ['CUDA_VISIBLE_DEVICES'] = "-1"(ie no GPU use), there is no error and the output is as expected shown below.

2020-04-23 13:18:24.911806: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-04-23 13:18:24.916849: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-04-23 13:18:24.920347: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-04-23 13:18:24.920384: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: vumacs
2020-04-23 13:18:24.920389: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: vumacs
2020-04-23 13:18:24.920456: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 440.64.0
2020-04-23 13:18:24.920482: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 440.64.0
2020-04-23 13:18:24.920489: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 440.64.0
2020-04-23 13:18:24.938734: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3299990000 Hz
2020-04-23 13:18:24.939659: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4849f40 executing computations on platform Host. Devices:
2020-04-23 13:18:24.939686: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
GPU not being used

Is there any way to resolve this erro since I previously used the same code by setting CUDA_VISIBLE_DEVICES to 0 both through the script as well as shell and there were no issues. The error seems to be occuring when setting the session with tf.compat.v1.Session(config=config)

Providing a few further logs

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
2020-04-23 15:32:47.855593: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-04-23 15:32:47.884652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:b3:00.0
2020-04-23 15:32:47.885146: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-04-23 15:32:47.886730: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-04-23 15:32:47.888298: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-04-23 15:32:47.888855: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-04-23 15:32:47.890673: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-04-23 15:32:47.892068: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-04-23 15:32:47.895348: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-04-23 15:32:47.896233: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
Num GPUs Available:  1

But then executing this line gives me the same error

tf.test.gpu_device_name()
2020-04-23 15:34:50.948097: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-04-23 15:34:50.983906: W tensorflow/compiler/xla/service/platform_util.cc:256] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_INVALID_VALUE: invalid argument
2020-04-23 15:34:50.984119: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: no supported devices found for platform CUDA
Aborted (core dumped)

Hi,
It seems to be due to incompatible CUDA version.
Please refer below link for more details:


https://www.tensorflow.org/install/gpu#software_requirements

Thanks

If you look at nvcc -v command I have included, I am using CUDA 10.0 with TF 1.14 here which are compatible. As far as I know there can be different versions of CUDA for the GPU for graphics and for compiling with tf. Also in the tf/Cuda logs included at the end of my post, you will notice that it successfully opens libcudart.so.10.0 and other 10.0 versions of libcuda*.so.10.0. In fact, I had earlier used the exact installation without issues. There has been no upgrades or updates to the tf installation, graphics driver, cuda, cudnn. It suddenly showing this error.

As per nvidia-smi command output it seems that CUDA 10.2 is installed on your setup.
Could you please try to downgrade it to CUDA 10.0?

Thanks