TensorFlow says CUDA not enabled

shallada · June 20, 2019, 9:41pm

I have a new HP Omen Obelisk 25L running ubuntu 18.4 with a RTX 2080 GPU I am trying to set up to do some machine learning with TensorFlow.

uname -r
4.18.0-22-generic

I have followed this tutorial:
[/code]

[/code]
and have successfully completed through Step 7.2.3.
This step indicates that CUDA is installed and working as shown here:

./NVIDIA_CUDA-10.1_Samples/bin/x86_64/linux/release/deviceQuery
./NVIDIA_CUDA-10.1_Samples/bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce RTX 2080"
  CUDA Driver Version / Runtime Version          10.1 / 10.1
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 7949 MBytes (8335458304 bytes)
  (46) Multiprocessors, ( 64) CUDA Cores/MP:     2944 CUDA Cores
  GPU Max Clock rate:                            1710 MHz (1.71 GHz)
  Memory Clock rate:                             7000 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 4194304 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.1, CUDA Runtime Version = 10.1, NumDevs = 1
Result = PASS

However, when I try to run a simple python program that uses the GPU:

import tensorflow as tf

# Creates a graph.
with tf.device('/device:GPU:0'):
  a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))

I see the following output:

(base) shallada@shalladaGPU:~/book$ python tftest.py 
2019-06-20 15:28:35.068487: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-20 15:28:35.089702: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192000000 Hz
2019-06-20 15:28:35.090500: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55e85b0c2d80 executing computations on platform Host. Devices:
2019-06-20 15:28:35.090520: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
2019-06-20 15:28:35.091129: I tensorflow/core/common_runtime/direct_session.cc:317] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device

MatMul: (MatMul): /job:localhost/replica:0/task:0/device:CPU:0
2019-06-20 15:28:35.091731: I tensorflow/core/common_runtime/placer.cc:1059] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:CPU:0
Traceback (most recent call last):
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1317, in _run_fn
    self._extend_graph()
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1352, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation a: {{node a}}was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device. The requested device appears to be a GPU, but CUDA is not enabled.
	 [[{{node a}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tftest.py", line 11, in <module>
    print(sess.run(c))
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation a: node a (defined at tftest.py:5) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device. The requested device appears to be a GPU, but CUDA is not enabled.
	 [[node a (defined at tftest.py:5) ]]

Caused by op 'a', defined at:
  File "tftest.py", line 5, in <module>
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 179, in constant_v1
    allow_broadcast=False)
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 289, in _constant_impl
    name=name).outputs[0]
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Cannot assign a device for operation a: node a (defined at tftest.py:5) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device. The requested device appears to be a GPU, but CUDA is not enabled.
	 [[node a (defined at tftest.py:5) ]]

My question is: Why does TensorFlow think CUDA is not enabled, and how to I make it enabled?

Robert_Crovella · June 21, 2019, 12:59pm

if you installed TF via conda, then those types of installs usually expect the CUDA toolkit to be installed using conda also. It may be simplest just to do conda install cudatoolkit. That will not disrupt your current install.

TF will require a specific version of the CUDA toolkit, and depending on which TF you are using, it may not be CUDA 10.1. However if you used conda to install TF, using conda install cudatoolkit should take care of it.

shallada · June 21, 2019, 3:15pm

Robert, Thanks for your reply. I tried installing cudatoolkit as you described. The toolkit seemed to install without a problem.

conda install cudatoolkit
Collecting package metadata: done
Solving environment: done

## Package Plan ##

  environment location: /home/shallada/anaconda3

  added / updated specs:
    - cudatoolkit


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.6.16          |           py37_0         154 KB
    conda-4.6.14               |           py37_0         2.1 MB
    cudatoolkit-10.1.168       |                0       516.0 MB
    ------------------------------------------------------------
                                           Total:       518.3 MB

The following NEW packages will be INSTALLED:

  cudatoolkit        pkgs/main/linux-64::cudatoolkit-10.1.168-0

The following packages will be UPDATED:

  certifi                 anaconda::certifi-2019.3.9-py37_0 --> pkgs/main::certifi-2019.6.16-py37_0

The following packages will be SUPERSEDED by a higher-priority channel:

  ca-certificates                                  anaconda --> pkgs/main
  conda                                            anaconda --> pkgs/main
  openssl                anaconda::openssl-1.1.1-h7b6447c_0 --> pkgs/main::openssl-1.1.1c-h7b6447c_1


Proceed ([y]/n)? y


Downloading and Extracting Packages
cudatoolkit-10.1.168 | 516.0 MB  | ##################################################################################################################################### | 100% 
certifi-2019.6.16    | 154 KB    | ##################################################################################################################################### | 100% 
conda-4.6.14         | 2.1 MB    | ##################################################################################################################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

Then I tried rerunning tftest.py as shown previously. The result was exactly the same.

python tftest.py 
2019-06-21 09:10:03.247018: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-21 09:10:03.268028: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192000000 Hz
2019-06-21 09:10:03.268884: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x562fde2f2090 executing computations on platform Host. Devices:
2019-06-21 09:10:03.268929: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
2019-06-21 09:10:03.269694: I tensorflow/core/common_runtime/direct_session.cc:317] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device

MatMul: (MatMul): /job:localhost/replica:0/task:0/device:CPU:0
2019-06-21 09:10:03.270269: I tensorflow/core/common_runtime/placer.cc:1059] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:CPU:0
Traceback (most recent call last):
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1317, in _run_fn
    self._extend_graph()
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1352, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation a: {{node a}}was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device. The requested device appears to be a GPU, but CUDA is not enabled.
	 [[{{node a}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "tftest.py", line 11, in <module>
    print(sess.run(c))
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation a: node a (defined at tftest.py:5) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device. The requested device appears to be a GPU, but CUDA is not enabled.
	 [[node a (defined at tftest.py:5) ]]

Caused by op 'a', defined at:
  File "tftest.py", line 5, in <module>
    a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 179, in constant_v1
    allow_broadcast=False)
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 289, in _constant_impl
    name=name).outputs[0]
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
    op_def=op_def)
  File "/home/shallada/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Cannot assign a device for operation a: node a (defined at tftest.py:5) was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device. The requested device appears to be a GPU, but CUDA is not enabled.
	 [[node a (defined at tftest.py:5) ]]

shallada · June 21, 2019, 4:09pm

I got thinking about your suggestion, which caused me to update all the packages:

conda update --all

then I reran tftest.py with the results below. It appears to have solved the problem. Thanks for your help!

python tftest.py
2019-06-21 10:05:29.382268: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-06-21 10:05:29.402646: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192000000 Hz
2019-06-21 10:05:29.403498: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x562ed25cef10 executing computations on platform Host. Devices:
2019-06-21 10:05:29.403515: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-06-21 10:05:29.691980: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-21 10:05:29.694194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.71
pciBusID: 0000:01:00.0
totalMemory: 7.76GiB freeMemory: 7.51GiB
2019-06-21 10:05:29.694245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-06-21 10:05:29.697722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-21 10:05:29.697797: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-06-21 10:05:29.697818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-06-21 10:05:29.698121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7303 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-06-21 10:05:29.701985: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x562ed19635f0 executing computations on platform CUDA. Devices:
2019-06-21 10:05:29.702060: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device
2019-06-21 10:05:29.703766: I tensorflow/core/common_runtime/direct_session.cc:317] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2080, pci bus id: 0000:01:00.0, compute capability: 7.5
/job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device

MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2019-06-21 10:05:29.705750: I tensorflow/core/common_runtime/placer.cc:1059] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-06-21 10:05:29.705808: I tensorflow/core/common_runtime/placer.cc:1059] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2019-06-21 10:05:29.705839: I tensorflow/core/common_runtime/placer.cc:1059] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
[[22. 28.]
 [49. 64.]]