CUDA_ERROR_OUT_OF_MEMORY: out of memory on Nvidia Quadro 8000, with more than enough available memory

We recently got a Quadro 8000 for training purposes at our lab. However, I am not able to run the simplest of codes, where cuda_driver.cc complains about failing to allocate memory (with subsequent messages indicating that cuda failed to allocate 38.17G, then 34.36G, 30.92G, 27.83G, 25.05G, 22.54G) even when GPU:0 is shown to be having 39090 MB memory. I am using miniconda based python with tensorflow-gpu 2.0.0 and compatible versions of cudnn(7.6.4) and cudatoolkit(10.0.130) pulled automatically using conda install. The simple code is as follows.

###############
from future import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices(‘GPU’)
if gpus:
try:
#tf.config.experimental.set_virtual_device_configuration(
# gpus[0],[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024*20)])
#tf.config.experimental.set_memory_growth(gpu, True)
# print(‘tf Memory growth : %r’ % (tf.config.experimental.get_memory_growth(gpus[0])))

logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print("%d Physical GPUs, %d Logical GPUs" % (len(gpus), len(logical_gpus)))
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)
print(c)

except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)

###############

I do have a temporary fix, by setting a memory limit by uncommenting the first two lines after the try statement in the code above. I discovered though, that I cannot force the gpu to allocate more than approximately 20G, eventhough the gpu has around double the memory available.
Googling around points to setting either the memory growth or memory limit on the GPU. I have even tried to set both (uncommenting the 3rd and 4th lines as well), but to no avail.
Anyone come across a similar issue?
Thanks for reading.

I am able to run the repro above on an RTX 8000 without a problem.

2020-01-17 23:34:20.032246: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 42559 MB memory) -> physical GPU (device: 0, name: Quadro RTX 8000, pci bus id: 0000:09:00.0, compute capability: 7.5)
2020-01-17 23:34:20.033732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 42562 MB memory) -> physical GPU (device: 1, name: Quadro RTX 8000, pci bus id: 0000:0a:00.0, compute capability: 7.5)
2 Physical GPUs, 2 Logical GPUs
tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32

What NVIDIA driver, CUDA runtime, and CUDNN versions are you using?

I ran in the 19.12-tf2 NGC Docker container with a 440.33 driver. https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow/tags

What’s wrong with the issue of CUDA_ERROR_OUT_OF_MEMORY?

I run a small test application(just one image) on ResNet152 on either RTX 2060 or RTX 2070 Super. Even though the GPU Memory is 100% empty, it still shows CUDA_ERROR_OUT_OF_MEMORY

Please see the following both the code and the log information.

1. Setting in the Code

# Set up the GPU to avoid the runtime error: Could not create cuDNN handle...
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

os.environ["CUDA_DEVICE_ORDER"] ='0'

2. Message from the GPU

2020-10-06 14:56:03.677972: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-06 14:56:04.843552: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 3.18G (3410586368 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory

Hope to get a help.

Mike

I have used the following section of code to solve the the issue.

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 4GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096)])
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

Cheers!