Anyone knows how to solve ResourceExhaustedError on tx2?

(?, 40, 68, 192)
Tensor(“Softmax:0”, shape=(?, 2), dtype=float32)
Traceback (most recent call last):
File “./test.py”, line 167, in
sess.run(tf.global_variables_initializer(),options=run_options)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 900, in run
run_metadata_ptr)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 1135, in _run
feed_dict_tensor, options, run_metadata)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 1316, in _do_run
run_metadata)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py”, line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[522240,192] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: truncated_normal/TruncatedNormal = TruncatedNormalT=DT_INT32, dtype=DT_FLOAT, seed=0, seed2=0, _device=“/job:localhost/replica:0/task:0/device:GPU:0”]]

Current usage from device: /job:localhost/replica:0/task:0/device:GPU:0, allocator: GPU_0_bfc
864.0KiB from random_normal_2/RandomStandardNormal
288.0KiB from random_normal_1/RandomStandardNormal
Remaining 1 nodes with 6.2KiB

Caused by op u’truncated_normal/TruncatedNormal’, defined at:
File “./test.py”, line 136, in
w4 = tf.Variable(tf.truncated_normal([fc1_unit, fc2_unit]))
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/random_ops.py”, line 174, in truncated_normal
shape_tensor, dtype, seed=seed1, seed2=seed2)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_random_ops.py”, line 850, in truncated_normal
name=name)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py”, line 787, in _apply_op_helper
op_def=op_def)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py”, line 3392, in create_op
op_def=op_def)
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py”, line 1718, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[522240,192] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: truncated_normal/TruncatedNormal = TruncatedNormalT=DT_INT32, dtype=DT_FLOAT, seed=0, seed2=0, _device=“/job:localhost/replica:0/task:0/device:GPU:0”]]

Current usage from device: /job:localhost/replica:0/task:0/device:GPU:0, allocator: GPU_0_bfc
864.0KiB from random_normal_2/RandomStandardNormal
288.0KiB from random_normal_1/RandomStandardNormal
Remaining 1 nodes with 6.2KiB

After I ran test code for machinelearning on tx2, I 've got this messages.
-I have 6GB/28GB memeories left, and 3GB left.
-I tested only one image for machine learning code usign tensorflow.
Is there any way I can solve this problem?
On computer(ubuntu16.04 and (python version and tensorflow verison)same environment as jetson tx2), the code runs without any error.
I think there must be some solutions like

  1. similar to (nvidia-smi exclusive compute mode) on computer which is not supported on tx2, there must be a way to manipulate gpu.
    or
  2. I should erase some files, since I have installed many libraries and left some not used before I decided to use only python and tensorflow as framework. but I am not sure which files are okay to erase…
  3. If anyone has similar problem and knows some solutions above or any solutions except them, please help me thank you.

Hi,

First of all, ResourceExhaustedError indicates running out of memory.
To get more information, could you monitor device memory with tegrastats at the same time?

sudo ./tegrastats

Suggestions of your questions:
1. Could you share the type of your desktop GPU.
TX2 has 8G memory(although it shared by CPU/GPU) and is compatible to lots of GPUs(In gaming segment).

2. If you don’t load these libraries into memory, it won’t occupy the memory amount.

3. Could you check which batch-size do you use?
If you only inference ONE image per time, it’s recommended to set batch-size to 1 to save memory.

Thanks.

Hi,

Could you share the log of TensorFlow session creation with us?
For example:

name: NVIDIA Tegra X2
major: 6 minor: 2 memoryClockRate (GHz) 1.3005
pciBusID 0000:00:00.0
Total memory: 7.67GiB
Free memory: 5.30GiB

2017-07-26 17:21:02.457343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0
2017-07-26 17:21:02.457374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0: Y

More, could you share where your tensorflow package from.
Do you build it from source or download from public website?

Thanks.