cuda_driver failed_to_allocate problem CUDA_ERROR_OUT_OF_MEMORY

abdulkerim1234 · April 18, 2019, 1:26pm

Hi

nvidia-smi command is showing little portion of memory is used by my 2 gtx1080 GPUS. But when i run small code shown here :

import os
import tensorflow as tf
print (tf.version)

import keras
print ("keras version ",keras.version)

print(“tesnorflow path :”,tf.path)
print(“tensorflow path :”,tf.version)
print(" Checking if GPU is being used :- ")
#print(tf.Session(config=tf.ConfigProto(log_device_placement=True)))
os.environ[“CUDA_VISIBLE_DEVICES”] = “0,1”
print(tf.Session(config=tf.ConfigProto(log_device_placement=True)))

To see if the gpus are load and being used (ALSO GPU device mapping) , then I got the following error :

2019-04-18 20:34:55.232623: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-04-18 20:34:55.426794: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-18 20:34:55.427498: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6575
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 10.45GiB
2019-04-18 20:34:55.512737: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-18 20:34:55.513530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6575
pciBusID: 0000:02:00.0
totalMemory: 10.91GiB freeMemory: 10.32GiB
2019-04-18 20:34:55.513898: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0, 1
2019-04-18 20:34:59.956899: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-18 20:34:59.956916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0 1
2019-04-18 20:34:59.956920: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N N
2019-04-18 20:34:59.956923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1: N N
2019-04-18 20:34:59.957393: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9915 MB memory) → physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-04-18 20:35:00.059331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 9915 MB memory) → physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
2019-04-18 20:35:00.060042: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 9.68G (10396788224 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-04-18 20:35:00.060641: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 8.71G (9357109248 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-04-18 20:35:00.061141: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 7.84G (8421398016 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
c:806] failed to allocate 878.77M (921460736 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
.
.
.

2019-04-18 20:35:00.072841: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 790.90M (829314816 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-04-18 20:35:00.073356: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 711.81M (746383360 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-04-18 20:35:00.073873: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 640.63M (671745024 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-04-18 20:35:00.074374: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 576.56M (604570624 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-04-18 20:35:00.074868: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 518.91M (544113664 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory

Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 → device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:1 → device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1
<tensorflow.python.client.session.Session object at 0x7f5affa59f98>
2019-04-18 20:35:00.079315: I tensorflow/core/common_runtime/direct_session.cc:291] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 → device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:1 → device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1

Process finished with exit code 0

I have 2 GTX10800i GPU and I cuda 9.0 cudnn 7.0.x nvidia-418 tensoflow-gpu 1.11.0
I try running my code with and without setting : -
os.environ[“CUDA_VISIBLE_DEVICES”] = “0,1” - This doesn’t help me

I restart the system but I don’t see this problem solved. Any help please

Topic		Replies	Views
"out of memory" problem.. CUDA Programming and Performance	1	6499	May 9, 2007
CUDA on iMac with NVIDIA GeForce 9400 Successful and Failed Tests CUDA Programming and Performance	5	41391	March 20, 2010
out of memory CUDA Programming and Performance	11	16600	April 13, 2009
[980 Ti, Windows 10, CUDA 7.5] Out of memory after allocating 4.5 out of 6gb CUDA Programming and Performance	7	5195	December 6, 2015
Driver bug?! CUDA Driver stops working for specific program CUDA Programming and Performance	0	1599	March 1, 2010
cudaMalloc fails on huge allocation CUDA Programming and Performance	4	838	March 28, 2011
How to allocate whole memory CUDA Programming and Performance	1	2072	February 5, 2009
Cuda allocate device memory failed CUDA Programming and Performance	0	1371	January 31, 2019
cudaMalloc CUDAStream::Allocate failed GPU memory under Linux CUDA Programming and Performance	0	1654	January 8, 2010
"out of memory" for all cuda function all CUDA Programming and Performance	0	856	December 21, 2011

cuda_driver failed_to_allocate problem CUDA_ERROR_OUT_OF_MEMORY

Related topics