Out of memory error from TensorFlow: any workaround for this, or do I just need a bigger boat?

I am running an application that employs a Keras-TensorFlow model to perform object detection. This model runs in tandem with a Caffe model that performs facial detection/recognition.

The application runs well on a laptop but when I run it on my Jetson Nano it crashes almost immediately. Below is the last part of the console output which I think shows that there’s a memory insufficiency (assuming OOM == out of memory).

Perhaps there’s a way to configure my system and/or the TensorFlow settings so that this is no longer an issue? Is there another way around this, perhaps by converting the model to run on TensorFlow Lite?

If anyone can give me some guidance as to how I can troubleshoot this issue further then please advise, thanks in advance for your kind help!

2019-05-21 13:22:47.917271: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free  at 0xf09325900 of size 256
2019-05-21 13:22:47.917300: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0xf09325a00 of size 256
2019-05-21 13:22:47.917332: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0xf09325b00 of size 16074752
2019-05-21 13:22:47.917364: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0xf0a27a300 of size 64299008
2019-05-21 13:22:47.917392: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0xf0dfcc300 of size 116251904
2019-05-21 13:22:47.917418: I tensorflow/core/common_runtime/bfc_allocator.cc:638]      Summary of in-use Chunks by size: 
2019-05-21 13:22:47.917512: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 82 Chunks of size 256 totalling 20.5KiB
2019-05-21 13:22:47.917562: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 34 Chunks of size 512 totalling 17.0KiB
2019-05-21 13:22:47.917595: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 82 Chunks of size 1024 totalling 82.0KiB
2019-05-21 13:22:47.917650: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 1280 totalling 1.2KiB
2019-05-21 13:22:47.917684: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 46 Chunks of size 2048 totalling 92.0KiB
2019-05-21 13:22:47.917715: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 30 Chunks of size 4096 totalling 120.0KiB
2019-05-21 13:22:47.917748: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 18 Chunks of size 8192 totalling 144.0KiB
2019-05-21 13:22:47.917780: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 16384 totalling 16.0KiB
2019-05-21 13:22:47.917812: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 37632 totalling 36.8KiB
2019-05-21 13:22:47.917844: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 6 Chunks of size 65536 totalling 384.0KiB
2019-05-21 13:22:47.917875: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 82944 totalling 81.0KiB
2019-05-21 13:22:47.917907: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 131072 totalling 128.0KiB
2019-05-21 13:22:47.917941: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 147456 totalling 432.0KiB
2019-05-21 13:22:47.917975: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 7 Chunks of size 262144 totalling 1.75MiB
2019-05-21 13:22:47.918013: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 331776 totalling 324.0KiB
2019-05-21 13:22:47.918049: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 524288 totalling 1.50MiB
2019-05-21 13:22:47.918081: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 4 Chunks of size 589824 totalling 2.25MiB
2019-05-21 13:22:47.918112: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 12 Chunks of size 1048576 totalling 12.00MiB
2019-05-21 13:22:47.918142: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 2097152 totalling 6.00MiB
2019-05-21 13:22:47.918174: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 18 Chunks of size 2359296 totalling 40.50MiB
2019-05-21 13:22:47.918203: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 5 Chunks of size 4194304 totalling 20.00MiB
2019-05-21 13:22:47.918233: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 8388608 totalling 8.00MiB
2019-05-21 13:22:47.918263: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 9437184 totalling 27.00MiB
2019-05-21 13:22:47.918294: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 16074752 totalling 15.33MiB
2019-05-21 13:22:47.918325: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 18874368 totalling 18.00MiB
2019-05-21 13:22:47.918356: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 64299008 totalling 61.32MiB
2019-05-21 13:22:47.918389: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 116251904 totalling 110.87MiB
2019-05-21 13:22:47.918419: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Sum Total of in-use chunks: 326.35MiB
2019-05-21 13:22:47.918454: I tensorflow/core/common_runtime/bfc_allocator.cc:647] Stats: 
Limit:                   342204416
InUse:                   342204160
MaxInUse:                342204160
NumAllocs:                     715
MaxAllocSize:            116251904

2019-05-21 13:22:47.918508: W tensorflow/core/common_runtime/bfc_allocator.cc:271] *************************************************************************************xxxxxxxxxxxxxxx
2019-05-21 13:22:47.937473: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at conv_ops.cc:735 : Resource exhausted: OOM when allocating tensor with shape[256,64,1,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "/home/james/.virtualenvs/nano/bin/monitor_video", line 10, in <module>
    sys.exit(monitor_video())
  File "/home/james/.virtualenvs/nano/lib/python3.6/site-packages/deep_monitor/__main__.py", line 299, in monitor_video
    _monitor(args["config"])
  File "/home/james/.virtualenvs/nano/lib/python3.6/site-packages/deep_monitor/__main__.py", line 229, in _monitor
    detections = detector_object.detect(frame, confidence_object)
  File "/home/james/.virtualenvs/nano/lib/python3.6/site-packages/deep_monitor/detector.py", line 121, in detect
    (boxes, scores, labels) = self.model.predict_on_batch(image)
  File "/home/james/.virtualenvs/nano/lib/python3.6/site-packages/keras/engine/training.py", line 1274, in predict_on_batch
    outputs = self.predict_function(ins)
  File "/home/james/.virtualenvs/nano/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/home/james/.virtualenvs/nano/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "/home/james/.virtualenvs/nano/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
    run_metadata_ptr)
  File "/home/james/.virtualenvs/nano/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[256,64,1,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node res2a_branch2c/convolution}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[{{node filtered_detections/map/while/PadV2_2/paddings}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

FATAL: exception not rethrown

Hi,

You can try these configure to see if helps.

config = tf.ConfigProto()
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

This depends on if there is an algorithm with fewer memory.

One possibility is that the minimal memory required by the model is already over the Nano’s ability (4G).
For this issue, please try to decrease the batchsize number or use a more lightweight model.

Thanks.

Thanks for this suggestion, @AastaLLL.

What is most unusual is that I was eventually able to run my application after posting this yesterday, but only once, and all other times it failed again with the same out-of-memory errors listed above.

My application incorporates a model based upon Keras-Retinanet and I’m not sure how to access the TF Session object used therein. Maybe it’s somewhere within Keras, and there’s a way that I can pass along the configuration suggested above? According to this I might be able to access the Session being used, and if I manage to access this could I then update the Session’s configuration?

In case it’s relevant, I instantiate my custom trained Keras-Retinanet inference model like so:

from keras_retinanet import models
model = models.load_model(custom_trained_inference_model, backbone_name="resnet50")

I use the model to detect objects from video image frames like so:

from keras_retinanet.utils.image import preprocess_image, resize_image

# preprocess image frame into a format that's useful as input to the model
image = preprocess_image(frame)
(image, scale) = resize_image(image)
image = np.expand_dims(image, axis=0)

# detect objects in the input image
(boxes, scores, labels) = self.model.predict_on_batch(image)

Perhaps this is relevant?: How to run Keras model on Jetson Nano

“Normally this frozen graph is what you use for deploying. However, it is not optimized to run on Jetson Nano for both speed and resource efficiency wise. That is what TensorRT comes into play, it quantizes the model from FP32 to FP16, effectively reducing the memory consumption. It also fuses layers and tensor together which further optimizes the use of GPU memory and bandwidth. All this come with little or no noticeable reduced accuracy.”

try:
what Aasta wrote+
keras.backend.tensorflow_backend.set_session(session)

This should be done ONCE as soon as possible after importing keras, tf and setting the cuda device (if you do that)

Hi,

The memory usage of a third-party library may not be optimal on the Jetson Nano.
It’s recommended to use pure TensorRT instead.

We have a retinanet sample from pyTorch to TensorRT.
Is this a possible option for you?
[url]https://github.com/NVIDIA/retinanet-examples[/url]

Thanks.

Thanks, @AastaLLL, I will give this a whirl. I appreciate your help!

: )

Hi monocongo,

You said the model ran on your laptop. Do you mind sharing the TensorFlow models that you are running, as well as your laptop’s specifications (most importantly CPU / GPU memory)?

This may be helpful in determining what the underlying issue is.

Thanks!
John

Thanks, John.

I have been running the model on a Dell XPS 9570 w/16GB of memory. The GPU is GeForce GTX 1050Ti (4GB, I think). OS is Ubuntu 18.04. The model is Keras-RetinaNet (GitHub - fizyr/keras-retinanet: Keras implementation of RetinaNet object detection.) using tensorflow-gpu as the backend.

I can share the model I’m running if you can advise somewhere convenient to post it.

I appreciate your help!

–James

I have modified the recommended NVIDIA RetinaNet example for performing object detection (inferencing) on video streams on a Jetson Nano. I have managed to get the code to run on a laptop. However, I have discovered that nvidia-docker is not yet supported on the Jetson Nano. nvidia-docker seems to be a hard requirement for this project (see this issue) so it seems that this project is not yet useful on a Jetson Nano. Or have I missed something?

Maybe I can convert the model to TensorRT as described here and then run that optimized model on my Jetson Nano outside of the nvidia-docker context? Is there an example of how this is done? Maybe like this?

Any guidance is very appreciated, thanks in advance…

Can we please revive this topic? Having the same issue - were we able to rectify it?

Thanks.