Hi
nvidia-smi command is showing little portion of memory is used by my 2 gtx1080 GPUS. But when i run small code shown here :
import os
import tensorflow as tf
print (tf.version)
import keras
print ("keras version ",keras.version)
print(“tesnorflow path :”,tf.path)
print(“tensorflow path :”,tf.version)
print(" Checking if GPU is being used :- ")
#print(tf.Session(config=tf.ConfigProto(log_device_placement=True)))
os.environ[“CUDA_VISIBLE_DEVICES”] = “0,1”
print(tf.Session(config=tf.ConfigProto(log_device_placement=True)))
To see if the gpus are load and being used (ALSO GPU device mapping) , then I got the following error :
2019-04-18 20:34:55.232623: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-04-18 20:34:55.426794: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-18 20:34:55.427498: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6575
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 10.45GiB
2019-04-18 20:34:55.512737: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-18 20:34:55.513530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6575
pciBusID: 0000:02:00.0
totalMemory: 10.91GiB freeMemory: 10.32GiB
2019-04-18 20:34:55.513898: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0, 1
2019-04-18 20:34:59.956899: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-18 20:34:59.956916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0 1
2019-04-18 20:34:59.956920: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N N
2019-04-18 20:34:59.956923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 1: N N
2019-04-18 20:34:59.957393: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9915 MB memory) → physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-04-18 20:35:00.059331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 9915 MB memory) → physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
2019-04-18 20:35:00.060042: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 9.68G (10396788224 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-04-18 20:35:00.060641: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 8.71G (9357109248 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-04-18 20:35:00.061141: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 7.84G (8421398016 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
c:806] failed to allocate 878.77M (921460736 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
.
.
.
2019-04-18 20:35:00.072841: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 790.90M (829314816 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-04-18 20:35:00.073356: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 711.81M (746383360 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-04-18 20:35:00.073873: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 640.63M (671745024 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-04-18 20:35:00.074374: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 576.56M (604570624 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2019-04-18 20:35:00.074868: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 518.91M (544113664 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 → device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:1 → device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1
<tensorflow.python.client.session.Session object at 0x7f5affa59f98>
2019-04-18 20:35:00.079315: I tensorflow/core/common_runtime/direct_session.cc:291] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 → device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1
/job:localhost/replica:0/task:0/device:GPU:1 → device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1
Process finished with exit code 0
I have 2 GTX10800i GPU and I cuda 9.0 cudnn 7.0.x nvidia-418 tensoflow-gpu 1.11.0
I try running my code with and without setting : -
os.environ[“CUDA_VISIBLE_DEVICES”] = “0,1” - This doesn’t help me
I restart the system but I don’t see this problem solved. Any help please