TensorFlow not working after update

Not sure where to post this issue. I have recently upgraded my system and the following packages changed versions:

  • CUDA 10.2 -> 11.0
  • cuDNN 7.6 -> 8.0
  • TensorFlow-CUDA 2.2 -> 2.3
    Before that everything seemed to work as it should (I have been using TensorFlow with no apparent issue). Now TensorFlow doesn’t seem to work anymore. For example the code:
import tensorflow as tf
m = tf.keras.Sequential()

Will produce the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 457, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "/usr/lib/python3.8/site-packages/tensorflow/python/keras/engine/sequential.py", line 116, in __init__
    super(functional.Functional, self).__init__(  # pylint: disable=bad-super-call
  File "/usr/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 457, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "/usr/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 308, in __init__
  File "/usr/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 457, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "/usr/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 317, in _init_batch_counters
    self._train_counter = variables.Variable(0, dtype='int64', aggregation=agg)
  File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 262, in __call__
    return cls._variable_v2_call(*args, **kwargs)
  File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 244, in _variable_v2_call
    return previous_getter(
  File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 237, in <lambda>
    previous_getter = lambda **kws: default_variable_creator_v2(None, **kws)
  File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py", line 2633, in default_variable_creator_v2
    return resource_variable_ops.ResourceVariable(
  File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 264, in __call__
    return super(VariableMetaclass, cls).__call__(*args, **kwargs)
  File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1507, in __init__
  File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1661, in _init_from_args
    handle = eager_safe_variable_handle(
  File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 242, in eager_safe_variable_handle
    return _variable_handle_from_shape_and_dtype(
  File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 174, in _variable_handle_from_shape_and_dtype
    gen_logging_ops._assert(  # pylint: disable=protected-access
  File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 49, in _assert
    _ops.raise_from_not_ok_status(e, name)
  File "/usr/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 6843, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse

The above code works fine when the GPU is deactivated. Not sure what to do now. Is this a bug with CUDA or cuDNN or the Nvidia driver? What should I do to solve the issue? Should I submit a bug report to Nvidia?

I I am stuck with the same issue from days. People on github say that it’s generated when executing different processes that use python/tensorflow, but checking the nvidia-smi I get no other processes running. Did you find a solution for the problem?

None beside downgrading the packages. This might be a bug with either tensorflow, cuda or cudnn. There’s an open issue at: https://github.com/tensorflow/tensorflow/issues/41855. The issue I have has nothing to do with other processes using the GPU (see discussion in the link on how to check that).