CUDA_ERROR_OUT_OF_MEMORY: out of memory on Nvidia Quadro 8000, with more than enough available memory

rich4rd.macwan · December 3, 2019, 2:33pm

We recently got a Quadro 8000 for training purposes at our lab. However, I am not able to run the simplest of codes, where cuda_driver.cc complains about failing to allocate memory (with subsequent messages indicating that cuda failed to allocate 38.17G, then 34.36G, 30.92G, 27.83G, 25.05G, 22.54G) even when GPU:0 is shown to be having 39090 MB memory. I am using miniconda based python with tensorflow-gpu 2.0.0 and compatible versions of cudnn(7.6.4) and cudatoolkit(10.0.130) pulled automatically using conda install. The simple code is as follows.

###############
from future import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices(‘GPU’)
if gpus:
try:
#tf.config.experimental.set_virtual_device_configuration(
# gpus[0],[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024*20)])
#tf.config.experimental.set_memory_growth(gpu, True)
# print(‘tf Memory growth : %r’ % (tf.config.experimental.get_memory_growth(gpus[0])))

logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print("%d Physical GPUs, %d Logical GPUs" % (len(gpus), len(logical_gpus)))
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)
print(c)

except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)

###############

I do have a temporary fix, by setting a memory limit by uncommenting the first two lines after the try statement in the code above. I discovered though, that I cannot force the gpu to allocate more than approximately 20G, eventhough the gpu has around double the memory available.
Googling around points to setting either the memory growth or memory limit on the GPU. I have even tried to set both (uncommenting the 3rd and 4th lines as well), but to no avail.
Anyone come across a similar issue?
Thanks for reading.

nluehr · January 17, 2020, 11:53pm

I am able to run the repro above on an RTX 8000 without a problem.

2020-01-17 23:34:20.032246: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 42559 MB memory) -> physical GPU (device: 0, name: Quadro RTX 8000, pci bus id: 0000:09:00.0, compute capability: 7.5)
2020-01-17 23:34:20.033732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 42562 MB memory) -> physical GPU (device: 1, name: Quadro RTX 8000, pci bus id: 0000:0a:00.0, compute capability: 7.5)
2 Physical GPUs, 2 Logical GPUs
tf.Tensor(
[[22. 28.]
 [49. 64.]], shape=(2, 2), dtype=float32

What NVIDIA driver, CUDA runtime, and CUDNN versions are you using?

I ran in the 19.12-tf2 NGC Docker container with a 440.33 driver. https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow/tags

mikechen6688 · October 6, 2020, 7:12am

What’s wrong with the issue of CUDA_ERROR_OUT_OF_MEMORY?

I run a small test application(just one image) on ResNet152 on either RTX 2060 or RTX 2070 Super. Even though the GPU Memory is 100% empty, it still shows CUDA_ERROR_OUT_OF_MEMORY

Please see the following both the code and the log information.

1. Setting in the Code

# Set up the GPU to avoid the runtime error: Could not create cuDNN handle...
gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

os.environ["CUDA_DEVICE_ORDER"] ='0'

2. Message from the GPU

2020-10-06 14:56:03.677972: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-10-06 14:56:04.843552: I tensorflow/stream_executor/cuda/cuda_driver.cc:763] failed to allocate 3.18G (3410586368 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory

Hope to get a help.

Mike

mikechen6688 · October 6, 2020, 8:12am

I have used the following section of code to solve the the issue.

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 4GB of memory on the first GPU
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],
        [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096)])
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)

Cheers!

Topic		Replies	Views
Quadro rtx 8000 Out of Memory with everything CUDA Programming and Performance	2	1067	January 19, 2020
Quadro rtx 8000 is out of memory with everything cuDNN	3	2888	October 12, 2021
"out of memory" problem.. CUDA Programming and Performance	1	6503	May 9, 2007
cuda_driver failed_to_allocate problem CUDA_ERROR_OUT_OF_MEMORY CUDA Programming and Performance	0	1770	April 18, 2019
CUDA_ERROR_OUT_OF_MEMORY: out of memory when there is actually no such a large tensor to allocate cuDNN	1	12887	December 28, 2019
How to allocate whole memory CUDA Programming and Performance	1	2072	February 5, 2009
Unable to utilize all GPU memory when using tensorflow, failed to alloate memory CUDA Programming and Performance	1	1125	October 8, 2018
out of memory CUDA Programming and Performance	11	16602	April 13, 2009
Driver bug?! CUDA Driver stops working for specific program CUDA Programming and Performance	0	1599	March 1, 2010
CUDA on iMac with NVIDIA GeForce 9400 Successful and Failed Tests CUDA Programming and Performance	5	41391	March 20, 2010

CUDA_ERROR_OUT_OF_MEMORY: out of memory on Nvidia Quadro 8000, with more than enough available memory

Related topics