Out of memory error from TensorFlow: any workaround for this, or do I just need a bigger boat?

monocongo · May 21, 2019, 5:54pm

I am running an application that employs a Keras-TensorFlow model to perform object detection. This model runs in tandem with a Caffe model that performs facial detection/recognition.

The application runs well on a laptop but when I run it on my Jetson Nano it crashes almost immediately. Below is the last part of the console output which I think shows that there’s a memory insufficiency (assuming OOM == out of memory).

Perhaps there’s a way to configure my system and/or the TensorFlow settings so that this is no longer an issue? Is there another way around this, perhaps by converting the model to run on TensorFlow Lite?

If anyone can give me some guidance as to how I can troubleshoot this issue further then please advise, thanks in advance for your kind help!

2019-05-21 13:22:47.917271: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free  at 0xf09325900 of size 256
2019-05-21 13:22:47.917300: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0xf09325a00 of size 256
2019-05-21 13:22:47.917332: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0xf09325b00 of size 16074752
2019-05-21 13:22:47.917364: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0xf0a27a300 of size 64299008
2019-05-21 13:22:47.917392: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0xf0dfcc300 of size 116251904
2019-05-21 13:22:47.917418: I tensorflow/core/common_runtime/bfc_allocator.cc:638]      Summary of in-use Chunks by size: 
2019-05-21 13:22:47.917512: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 82 Chunks of size 256 totalling 20.5KiB
2019-05-21 13:22:47.917562: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 34 Chunks of size 512 totalling 17.0KiB
2019-05-21 13:22:47.917595: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 82 Chunks of size 1024 totalling 82.0KiB
2019-05-21 13:22:47.917650: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 1280 totalling 1.2KiB
2019-05-21 13:22:47.917684: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 46 Chunks of size 2048 totalling 92.0KiB
2019-05-21 13:22:47.917715: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 30 Chunks of size 4096 totalling 120.0KiB
2019-05-21 13:22:47.917748: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 18 Chunks of size 8192 totalling 144.0KiB
2019-05-21 13:22:47.917780: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 16384 totalling 16.0KiB
2019-05-21 13:22:47.917812: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 37632 totalling 36.8KiB
2019-05-21 13:22:47.917844: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 6 Chunks of size 65536 totalling 384.0KiB
2019-05-21 13:22:47.917875: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 82944 totalling 81.0KiB
2019-05-21 13:22:47.917907: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 131072 totalling 128.0KiB
2019-05-21 13:22:47.917941: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 147456 totalling 432.0KiB
2019-05-21 13:22:47.917975: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 7 Chunks of size 262144 totalling 1.75MiB
2019-05-21 13:22:47.918013: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 331776 totalling 324.0KiB
2019-05-21 13:22:47.918049: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 524288 totalling 1.50MiB
2019-05-21 13:22:47.918081: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 4 Chunks of size 589824 totalling 2.25MiB
2019-05-21 13:22:47.918112: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 12 Chunks of size 1048576 totalling 12.00MiB
2019-05-21 13:22:47.918142: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 2097152 totalling 6.00MiB
2019-05-21 13:22:47.918174: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 18 Chunks of size 2359296 totalling 40.50MiB
2019-05-21 13:22:47.918203: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 5 Chunks of size 4194304 totalling 20.00MiB
2019-05-21 13:22:47.918233: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 8388608 totalling 8.00MiB
2019-05-21 13:22:47.918263: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 9437184 totalling 27.00MiB
2019-05-21 13:22:47.918294: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 16074752 totalling 15.33MiB
2019-05-21 13:22:47.918325: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 18874368 totalling 18.00MiB
2019-05-21 13:22:47.918356: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 64299008 totalling 61.32MiB
2019-05-21 13:22:47.918389: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 116251904 totalling 110.87MiB
2019-05-21 13:22:47.918419: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Sum Total of in-use chunks: 326.35MiB
2019-05-21 13:22:47.918454: I tensorflow/core/common_runtime/bfc_allocator.cc:647] Stats: 
Limit:                   342204416
InUse:                   342204160
MaxInUse:                342204160
NumAllocs:                     715
MaxAllocSize:            116251904

2019-05-21 13:22:47.918508: W tensorflow/core/common_runtime/bfc_allocator.cc:271] *************************************************************************************xxxxxxxxxxxxxxx
2019-05-21 13:22:47.937473: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at conv_ops.cc:735 : Resource exhausted: OOM when allocating tensor with shape[256,64,1,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "/home/james/.virtualenvs/nano/bin/monitor_video", line 10, in <module>
    sys.exit(monitor_video())
  File "/home/james/.virtualenvs/nano/lib/python3.6/site-packages/deep_monitor/__main__.py", line 299, in monitor_video
    _monitor(args["config"])
  File "/home/james/.virtualenvs/nano/lib/python3.6/site-packages/deep_monitor/__main__.py", line 229, in _monitor
    detections = detector_object.detect(frame, confidence_object)
  File "/home/james/.virtualenvs/nano/lib/python3.6/site-packages/deep_monitor/detector.py", line 121, in detect
    (boxes, scores, labels) = self.model.predict_on_batch(image)
  File "/home/james/.virtualenvs/nano/lib/python3.6/site-packages/keras/engine/training.py", line 1274, in predict_on_batch
    outputs = self.predict_function(ins)
  File "/home/james/.virtualenvs/nano/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/home/james/.virtualenvs/nano/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "/home/james/.virtualenvs/nano/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
    run_metadata_ptr)
  File "/home/james/.virtualenvs/nano/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[256,64,1,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node res2a_branch2c/convolution}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[{{node filtered_detections/map/while/PadV2_2/paddings}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

FATAL: exception not rethrown

AastaLLL · May 22, 2019, 7:42am

Hi,

You can try these configure to see if helps.

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

This depends on if there is an algorithm with fewer memory.

One possibility is that the minimal memory required by the model is already over the Nano’s ability (4G).
For this issue, please try to decrease the batchsize number or use a more lightweight model.

Thanks.

monocongo · May 22, 2019, 3:03pm

Thanks for this suggestion, @AastaLLL.

What is most unusual is that I was eventually able to run my application after posting this yesterday, but only once, and all other times it failed again with the same out-of-memory errors listed above.

My application incorporates a model based upon Keras-Retinanet and I’m not sure how to access the TF Session object used therein. Maybe it’s somewhere within Keras, and there’s a way that I can pass along the configuration suggested above? According to this I might be able to access the Session being used, and if I manage to access this could I then update the Session’s configuration?

In case it’s relevant, I instantiate my custom trained Keras-Retinanet inference model like so:

from keras_retinanet import models
model = models.load_model(custom_trained_inference_model, backbone_name="resnet50")

I use the model to detect objects from video image frames like so:

from keras_retinanet.utils.image import preprocess_image, resize_image

# preprocess image frame into a format that's useful as input to the model
image = preprocess_image(frame)
(image, scale) = resize_image(image)
image = np.expand_dims(image, axis=0)

# detect objects in the input image
(boxes, scores, labels) = self.model.predict_on_batch(image)

monocongo · May 22, 2019, 3:54pm

Perhaps this is relevant?: How to run Keras model on Jetson Nano

“Normally this frozen graph is what you use for deploying. However, it is not optimized to run on Jetson Nano for both speed and resource efficiency wise. That is what TensorRT comes into play, it quantizes the model from FP32 to FP16, effectively reducing the memory consumption. It also fuses layers and tensor together which further optimizes the use of GPU memory and bandwidth. All this come with little or no noticeable reduced accuracy.”

moshe.livne · May 23, 2019, 10:38pm

try:
what Aasta wrote+
keras.backend.tensorflow_backend.set_session(session)

This should be done ONCE as soon as possible after importing keras, tf and setting the cuda device (if you do that)

AastaLLL · May 31, 2019, 6:34am

Hi,

The memory usage of a third-party library may not be optimal on the Jetson Nano.
It’s recommended to use pure TensorRT instead.

We have a retinanet sample from pyTorch to TensorRT.
Is this a possible option for you?
[url]https://github.com/NVIDIA/retinanet-examples[/url]

Thanks.

monocongo · May 31, 2019, 6:54pm

Thanks, @AastaLLL, I will give this a whirl. I appreciate your help!

AastaLLL · June 3, 2019, 2:41am

: )

jaybdub · June 4, 2019, 12:07am

Hi monocongo,

You said the model ran on your laptop. Do you mind sharing the TensorFlow models that you are running, as well as your laptop’s specifications (most importantly CPU / GPU memory)?

This may be helpful in determining what the underlying issue is.

Thanks!
John

monocongo · June 4, 2019, 12:26am

Thanks, John.

I have been running the model on a Dell XPS 9570 w/16GB of memory. The GPU is GeForce GTX 1050Ti (4GB, I think). OS is Ubuntu 18.04. The model is Keras-RetinaNet (GitHub - fizyr/keras-retinanet: Keras implementation of RetinaNet object detection.) using tensorflow-gpu as the backend.

I can share the model I’m running if you can advise somewhere convenient to post it.

I appreciate your help!

–James

monocongo · June 20, 2019, 7:08pm

I have modified the recommended NVIDIA RetinaNet example for performing object detection (inferencing) on video streams on a Jetson Nano. I have managed to get the code to run on a laptop. However, I have discovered that nvidia-docker is not yet supported on the Jetson Nano. nvidia-docker seems to be a hard requirement for this project (see this issue) so it seems that this project is not yet useful on a Jetson Nano. Or have I missed something?

Maybe I can convert the model to TensorRT as described here and then run that optimized model on my Jetson Nano outside of the nvidia-docker context? Is there an example of how this is done? Maybe like this?

Any guidance is very appreciated, thanks in advance…

roupenm · June 12, 2020, 2:32pm

Can we please revive this topic? Having the same issue - were we able to rectify it?

Thanks.

Topic		Replies	Views
Tensorflow crash when making an inference on Jetson Nano Jetson Nano jetpack , cuda , tensorflow	2	775	October 18, 2021
Slowly inference on Xavier NX and OOM fault with TensorFlow 2 Jetson Xavier NX jetson-inference	3	1358	October 18, 2021
tensorflow.python.framework.errors_impl.ResourceExhaustedError Jetson Nano tensorflow	8	4798	October 18, 2021
TensorFlow Issue - 'NonMaxSuppressionV3' in binary Jetson TX2	16	3149	October 18, 2021
Whith great dificulty I was able to insatll all packages. Now im trying to run a mask rcnn .h5 file(250). Getting memory error Jetson Nano	2	863	October 15, 2021
resource exhausted error using tensorflow on jetson nano Jetson Nano	6	4313	October 14, 2021
Best practice inference of TensorFlow bbject detection models on Jetson devices Jetson Xavier NX tensorflow	4	1367	March 24, 2022
TensorRT optimization of Keras model on Jetson TX2 TensorRT	3	1678	August 8, 2018
Reduce TensorFlow GPU usage Jetson TX2	10	1275	October 18, 2021
Memory Issues and Conversion issues with TF-TRT on Nano Jetson Nano tensorrt	8	1531	October 18, 2021

Out of memory error from TensorFlow: any workaround for this, or do I just need a bigger boat?

Related topics