TensorFlow Models : GPU Out of Memory

System Config: Jetson nano , Headless mode with jetpack 4.2.2, tensorflow gpu 1.14, open cv 3.3.1(default), 6GB Swapfile running on USB Disk, jetson_clocks running. Tried options mentioned in the posts : (a) https://devtalk.nvidia.com/default/topic/1062473/jetson-nano-most-gpu-memory-is-not-available-/?offset=3
(b)https://github.com/aymericdamien/TensorFlow-Examples/issues/38
(c)

Cannot run any tensorflow models on the nano. I get the error below

2019-10-18 01:57:57.868279: I tensorflow/core/common_runtime/bfc_allocator.cc:818] total_region_allocated_bytes_: 1349517312 memory_limit_: 1349857280 available bytes: 339968 curr_region_allocation_bytes_: 1073741824
2019-10-18 01:57:57.868328: I tensorflow/core/common_runtime/bfc_allocator.cc:824] Stats:
Limit: 1349857280
InUse: 855125248
MaxInUse: 946908416
NumAllocs: 168842
MaxAllocSize: 536870912

2019-10-18 01:57:57.868418: W tensorflow/core/common_runtime/bfc_allocator.cc:319] **********************************xxxxxxxxx_______________________
Traceback (most recent call last):
File “/home/jetsonnano0/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1356, in _do_call
return fn(*args)
File “/home/jetsonnano0/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File “/home/jetsonnano0/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Dst tensor is not initialized.
[[{{node _arg_Placeholder_291_0_26}}]]
[[ReadVariableOp_289/_1777]]
(1) Internal: Dst tensor is not initialized.
[[{{node _arg_Placeholder_291_0_26}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “GodFather_v2.py”, line 240, in
age_model = ageModel(params.age_weights)
File “/home/jetsonnano0/Hasslefree_Nano/Functions_v2.py”, line 274, in ageModel
age_model.load_weights(age_weights)
File “/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py”, line 492, in load_wrapper
return load_function(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/keras/engine/network.py”, line 1230, in load_weights
f, self.layers, reshape=reshape)
File “/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py”, line 1237, in load_weights_from_hdf5_group
K.batch_set_value(weight_value_tuples)
File “/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py”, line 2960, in batch_set_value
tf_keras_backend.batch_set_value(tuples)
File “/home/jetsonnano0/.local/lib/python3.6/site-packages/tensorflow/python/keras/backend.py”, line 3071, in batch_set_value
get_session().run(assign_ops, feed_dict=feed_dict)
File “/home/jetsonnano0/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 950, in run
run_metadata_ptr)
File “/home/jetsonnano0/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1173, in _run
feed_dict_tensor, options, run_metadata)
File “/home/jetsonnano0/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1350, in _do_run
run_metadata)
File “/home/jetsonnano0/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Dst tensor is not initialized.
[[{{node _arg_Placeholder_291_0_26}}]]
[[ReadVariableOp_289/_1777]]
(1) Internal: Dst tensor is not initialized.
[[{{node _arg_Placeholder_291_0_26}}]]
0 successful operations.
0 derived errors ignored.

Hi,

May I know if you can inference a small model, like mnist, on your environment?

If yes, you may really meet the out of memory issue since Nano’s resource is limited.
Please noticed that swap memory will only increase CPU-accessible memory amount but not for GPU.

Thanks.

1 Like

Sorry for the late reply @AastaLLL. I am trying to extract age and gender from an face stored as a image on the nano in jpg format. I tried 2 approaches

  1. Age and Gender Classification model : Apparent Age and Gender Prediction in Keras - Sefik Ilkin Serengil

Note: I ran both the approaches in nano after running (a)sudo nvpmodel -m 0
(b)sudo jetson_clocks.sh

Approach 1:
a) Generate the model listed in (1) on host machine(Ubuntu 18.04 LTS, 16Core, GPU, 64GB Ram)
b) Extracted the model json and weights file and scp into nano
c) Created a simple python script to load the model using items generated in (b)
d) After ~15-20min of running it fails by giving out of memory error and close watch using “htop” says only 1 core used out of 4 available and program runs till it hits 90% of Mem and Swap Space never goes above 50%

Approach 2:
a) Generate the model listed in (1) on host machine
b) converted the model to TensorRT using the steps listed [url]https://www.dlology.com/blog/how-to-run-keras-model-on-jetson-nano/[/url]
c) scp the .h5 file to nano
d) Created python script listed in the second half of the post (load the graph …)
e) It takes ~2hrs to load the graph and finally it errors out at :print(“image_size: {}”.format(image_size)) (section where we ask to load the graph)

I found another post that talks about speeding up the graph load times : TensorFlow/TensorRT (TF-TRT) Revisited , but i am worried to try it as my program does many other things than just age and gender calculation and modifying a core part of the engine might mess up the system.

Any help here is really appreciated.

Hi,

TensorFlow and TF-TRT usually occupy lots of memory and may easily lead to out of memory for Nano.

Is it possible to generate a .pb file from your Keras model?
If yes, would you mind to do a simple experiment to check if your model can run well with pure TensorRT?

cp -r /usr/src/tensorrt/ .
cd tensorrt/bin/
./trtexec --uff=/path/to/model.uff --uffInput=[name/and/size] --output=[name]

Thanks.