What could be causing the Tesla T4 GPUs to not be listed as device_type=‘GPU’ in the output of:
tf.config.list_physical_devices()
?
This results in “uknown device” when Tensorflow tries to access the GPUs:
with tf.device('/gpu:3'): a = tf.constant(3.0)
# Output
...
RuntimeError: /job:localhost/replica:0/task:0/device:GPU:3 unknown device.
What more can I do to troubleshoot this further?
Environment containing this issue:
- Cisco UCS c240 M5 system has:
** 72x CPUs and
** 5x Tesla T4 GPUs - Ubuntu 18.04 LTS
- NVIDIA-SMI 418.126.02 Driver Version: 418.126.02 CUDA Version: 10.1
** All 5 T4 are recognized by nvidia-smi - Tensorflow 2.1.0
- libcudnn7_7.6.5.32-1+cuda10.1_amd64
import tensorflow as tf
tf.config.list_physical_devices()
# Output
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
PhysicalDevice(name='/physical_device:XLA_CPU:0', device_type='XLA_CPU'),
PhysicalDevice(name='/physical_device:XLA_GPU:0', device_type='XLA_GPU'),
PhysicalDevice(name='/physical_device:XLA_GPU:1', device_type='XLA_GPU'),
PhysicalDevice(name='/physical_device:XLA_GPU:2', device_type='XLA_GPU'),
PhysicalDevice(name='/physical_device:XLA_GPU:3', device_type='XLA_GPU'),
PhysicalDevice(name='/physical_device:XLA_GPU:4', device_type='XLA_GPU')]
tf.config.list_physical_devices('GPU')
# Output
[]
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
# Output
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13771363116588327167
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 17218183889029182531
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 9157258922701839704
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:1"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 4158970543181084654
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:2"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 15403740508526850072
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:3"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 9287480476894551351
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:4"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 15439875423529567742
physical_device_desc: "device: XLA_GPU device"
]