Migrates SSD Mobilenet v1 from TF1 to TF2, consumes too much GPU memory

Hi everyone!!

We are migrating from TF1 to TF2 the object detection part for a project to use on an Nvidia Jetson TX2. Now, we can train an SSD Mobilenet v1 model [this] using the official “model_main_tf2.py” function, decreasing a lot the batch size to 32. We have a QUADRO P4000 GPU (8 GB).

We obtain some warnings like that, but the execution is done:

2022-02-16 13:06:30.622479: W tensorflow/core/common_runtime/bfc_allocator.cc:275] Allocator (GPU_0_bfc) ran out of memory trying to allocate 475.00MiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
..

When we increase the batch size to 64 (after more warnings), we got this error:

(1) RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[64,64,320,320] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node ssd_mobile_net_v1_fpn_keras_feature_extractor/model/conv_pw_1_bn/FusedBatchNormV3}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

We understand that the device has no more free GPU memory to allocate the complete model. However, previously, we have trained the “similar” (I thought) model [this] on TF1 (1.14.0) with a maximum batch size of 100 on the same GPU using “legacy/train.py” function.

We are diving into the issue and discovered TF2 uses Eager as default, however, TF1 uses Graph as default. We read Graph is more efficient and robust than eager, however Eager is easy-to-use. Then we have changed the TF2 mode to Graph using:

from tensorflow.python.framework.ops import disable_eager_execution
disable_eager_execution()

However, some functions of Object Detection API 2 are no compatible with Graph mode, the first error is:

File "/home/sts/.local/lib/python3.8/site-packages/tensorflow/python/distribute/input_lib.py", line 1383, in **iter**
raise RuntimeError(" **iter** () is only supported inside of tf.function "
RuntimeError: **iter** () is only supported inside of tf.function or when eager execution is enabled.

So, there is any problem with Object Detection API 2 and TF2? why does it expend too much memory for the same model? Can we change to graph mode to solve this? how?

System:

  • Ubuntu 20.04
  • Python 3.8
  • Nvidia Driver 510
  • CUDA 11.2
  • CuDNN 8.1
  • TensorFlow 2.8
  • Object Detection API 2 (master, commit 9c8cbd0)

Thant so much for helping us

Hi,
This looks like a Jetson issue. Please refer to the below samlples in case useful.

For any further assistance, we recommend you to raise it to the respective platform from the below link

Thanks!

Thanks for the answer, but our main problem is about the training using the P4000 GPU that is not enabled to allocate the same model on the new version of TensorFlow

Best regards

Hi,

This issue looks more related to Tensorflow.
We recommend you please post your concern on Tensorflow related platform to get better help. https://github.com/tensorflow/models/issues

Thank you.