tensorflow.python.framework.errors_impl.ResourceExhaustedError

I try to load a trained model using the function “keras.model.load_model” but I get an OOM error. My model is of memory 235 MB. This is the error trace:

8] 1 Chunks of size 35651584 totalling 34.00MiB
2020-09-28 11:44:30.540303: I tensorflow/core/common_runtime/bfc_allocator.cc:1002] Sum Total of in-use chunks: 283.36MiB
2020-09-28 11:44:30.540323: I tensorflow/core/common_runtime/bfc_allocator.cc:1004] total_region_allocated_bytes_: 298536960 memory_limit_: 298536960 available bytes: 0 curr_region_allocation_bytes_: 597073920
2020-09-28 11:44:30.565096: I tensorflow/core/common_runtime/bfc_allocator.cc:1010] Stats:
Limit: 298536960
InUse: 297128960
MaxInUse: 298171136
NumAllocs: 1582
MaxAllocSize: 35651584

2020-09-28 11:44:30.565202: W tensorflow/core/common_runtime/bfc_allocator.cc:439] xxxx**************************xx
2020-09-28 11:44:30.565399: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at cwise_ops_common.h:134 : Resource exhausted: OOM when allocating tensor with shape[3,3,128,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File “”, line 1, in
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/save.py”, line 184, in load_model
return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/hdf5_format.py”, line 178, in load_model_from_hdf5
custom_objects=custom_objects)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/saving/model_config.py”, line 55, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/serialization.py”, line 109, in deserialize
printable_module_name=‘layer’)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/generic_utils.py”, line 373, in deserialize_keras_object
list(custom_objects.items())))
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/network.py”, line 987, in from_config
config, custom_objects)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/network.py”, line 2029, in reconstruct_from_config
process_node(layer, node_data)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/network.py”, line 1977, in process_node
output_tensors = layer(input_tensors, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py”, line 897, in call
self._maybe_build(inputs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py”, line 2416, in _maybe_build
self.build(input_shapes) # pylint:disable=not-callable
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/convolutional.py”, line 166, in build
dtype=self.dtype)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py”, line 577, in add_weight
caching_device=caching_device)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/base.py”, line 743, in _add_variable_with_custom_getter
**kwargs_for_getter)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer_utils.py”, line 141, in make_variable
shape=variable_shape if variable_shape else None)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py”, line 259, in call
return cls._variable_v1_call(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py”, line 220, in _variable_v1_call
shape=shape)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py”, line 198, in
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py”, line 2598, in default_variable_creator
shape=shape)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py”, line 263, in call
return super(VariableMetaclass, cls).call(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py”, line 1434, in init
distribute_strategy=distribute_strategy)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py”, line 1567, in _init_from_args
initial_value() if init_from_fn else initial_value,
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer_utils.py”, line 121, in
init_val = lambda: initializer(shape, dtype=dtype)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops_v2.py”, line 558, in call
return self._random_generator.random_uniform(shape, -limit, limit, dtype)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops_v2.py”, line 1068, in random_uniform
shape=shape, minval=minval, maxval=maxval, dtype=dtype, seed=self.seed)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/random_ops.py”, line 301, in random_uniform
result = math_ops.add(result * (maxval - minval), minval, name=name)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py”, line 984, in binary_op_wrapper
return func(x, y, name=name)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py”, line 1283, in _mul_dispatch
return gen_math_ops.mul(x, y, name=name)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_math_ops.py”, line 6089, in mul
_ops.raise_from_not_ok_status(e, name)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py”, line 6653, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File “”, line 3, in raise_from
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[3,3,128,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Mul]

Any help please?

Hi,

To deploy a model, you will also need some memory for input/output/intermediate tensor.
As a result, the real required memory is much more than the model itself.

Would you mind to check the memory status with tegrastats and share with us?

$ sudo tegrastats

Please also try the configure shared below to see if helps.

- TFv1.15

- TFv2.x

Thanks.

thank you for your reply. I tried the configuration you mentioned and now I could load the model but when I try to predict it give me this error:
W tensorflow/core/common_runtime/bfc_allocator.cc:245] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1,05GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

this error is displayed several times in a loop and after some seconds my jetson nano is powered off.

this is the output from the command sudo tegrastats:

RAM 1860/3956MB (lfb 101x4MB) SWAP 652/10100MB (cached 21MB) IRAM 0/252kB(lfb 252kB) CPU [21%@518,14%@518,19%@518,20%@403] EMC_FREQ 3%@1600 GR3D_FREQ 0%@153 APE 25 PLL@42C CPU@43C PMIC@100C GPU@41.5C AO@46.5C thermal@42.25C POM_5V_IN 2297/2341 POM_5V_GPU 82/93 POM_5V_CPU 328/321
RAM 1861/3956MB (lfb 101x4MB) SWAP 652/10100MB (cached 21MB) IRAM 0/252kB(lfb 252kB) CPU [16%@518,15%@518,14%@518,12%@518] EMC_FREQ 3%@1600 GR3D_FREQ 0%@153 APE 25 PLL@42C CPU@43C PMIC@100C GPU@41.5C AO@46.5C thermal@42.25C POM_5V_IN 2256/2330 POM_5V_GPU 82/92 POM_5V_CPU 328/322

PS: just for you info, when I run my scrip which load the keras model and do prediction I open the system monitor>ressources and I see that the memory usage reachs 3.5GB when the code through the out of memory error.

Hi,

Sorry for the late.

The log indicates the memory on Nano is not large enough for a more efficient algorithm.
This is expected since Nano only have 4G memory. Some algorithm will be limited if it consumes too much memory.

Thanks.

Hi,

what would be the best solution then in this case?

Hi,

This will have some performance issue but it should be okay to inference it.
May I know which power supply are you using first?

Thanks.

Hi,

I am using the mini USB port to power the board with O 5.1V/2.5A

Hi,

Would you mind using the 5W mode to see if TensorFlow can work first?
https://docs.nvidia.com/jetson/l4t/index.html#page/Tegra%2520Linux%2520Driver%2520Package%2520Development%2520Guide%2Fpower_management_nano.html%23wwpID0E02K0HA

$ sudo nvpmodel -m 1
$ sudo jetson_clocks

If yes, the root cause is power starvation. Please check the following topic for more information:

Thanks.