Model_builder_tf2_test.py fails

Hi,
I am trying to run model_builder_tf2_test.py after installing the tensorflow object detection api. The test completes with 1 failed test:
ModelBuilderTF2Test.test_create_ssd_models_from_config

The console log was really long. I have attached sections where I think provides more info.

[ RUN      ] ModelBuilderTF2Test.test_create_ssd_models_from_config
I0616 12:16:41.015402 547548639248 ssd_efficientnet_bifpn_feature_extractor.py:146] EfficientDet EfficientNet backbone version: efficientnet-b0
I0616 12:16:41.015920 547548639248 ssd_efficientnet_bifpn_feature_extractor.py:147] EfficientDet BiFPN num filters: 64
I0616 12:16:41.016243 547548639248 ssd_efficientnet_bifpn_feature_extractor.py:149] EfficientDet BiFPN num iterations: 3
I0616 12:16:41.038930 547548639248 efficientnet_model.py:143] round_filter input=32 output=32
I0616 12:16:41.172047 547548639248 efficientnet_model.py:143] round_filter input=32 output=32
I0616 12:16:41.172853 547548639248 efficientnet_model.py:143] round_filter input=16 output=16
I0616 12:16:41.873748 547548639248 efficientnet_model.py:143] round_filter input=16 output=16
I0616 12:16:41.874298 547548639248 efficientnet_model.py:143] round_filter input=24 output=24
I0616 12:16:43.211043 547548639248 efficientnet_model.py:143] round_filter input=24 output=24
I0616 12:16:43.211837 547548639248 efficientnet_model.py:143] round_filter input=40 output=40
I0616 12:16:44.485020 547548639248 efficientnet_model.py:143] round_filter input=40 output=40
I0616 12:16:44.485730 547548639248 efficientnet_model.py:143] round_filter input=80 output=80
I0616 12:16:46.337943 547548639248 efficientnet_model.py:143] round_filter input=80 output=80
I0616 12:16:46.338494 547548639248 efficientnet_model.py:143] round_filter input=112 output=112
I0616 12:16:48.061728 547548639248 efficientnet_model.py:143] round_filter input=112 output=112
I0616 12:16:48.062232 547548639248 efficientnet_model.py:143] round_filter input=192 output=192
I0616 12:16:50.044371 547548639248 efficientnet_model.py:143] round_filter input=192 output=192
I0616 12:16:50.044832 547548639248 efficientnet_model.py:143] round_filter input=320 output=320
I0616 12:16:50.486413 547548639248 efficientnet_model.py:143] round_filter input=1280 output=1280
I0616 12:16:50.671287 547548639248 efficientnet_model.py:453] Building model efficientnet with params ModelConfig(width_coefficient=1.0, depth_coefficient=1.0, resolution=224, dropout_rate=0.2, blocks=(BlockConfig(input_filters=32, output_filters=16, kernel_size=3, num_repeat=1, expand_ratio=1, strides=(1, 1), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=16, output_filters=24, kernel_size=3, num_repeat=2, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=24, output_filters=40, kernel_size=5, num_repeat=2, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=40, output_filters=80, kernel_size=3, num_repeat=3, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=80, output_filters=112, kernel_size=5, num_repeat=3, expand_ratio=6, strides=(1, 1), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=112, output_filters=192, kernel_size=5, num_repeat=4, expand_ratio=6, strides=(2, 2), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise'), BlockConfig(input_filters=192, output_filters=320, kernel_size=3, num_repeat=1, expand_ratio=6, strides=(1, 1), se_ratio=0.25, id_skip=True, fused_conv=False, conv_type='depthwise')), stem_base_filters=32, top_base_filters=1280, activation='simple_swish', batch_norm='default', bn_momentum=0.99, bn_epsilon=0.001, weight_decay=5e-06, drop_connect_rate=0.2, depth_divisor=8, min_depth=None, use_se=True, input_channels=3, num_classes=1000, model_name='efficientnet', rescale_input=False, data_format='channels_last', dtype='float32')
I0616 12:16:50.986689 547548639248 ssd_efficientnet_bifpn_feature_extractor.py:146] EfficientDet EfficientNet backbone version: efficientnet-b1
I0616 12:16:50.987162 547548639248 ssd_efficientnet_bifpn_feature_extractor.py:147] EfficientDet BiFPN num filters: 88
I0616 12:16:50.987442 547548639248 ssd_efficientnet_bifpn_feature_extractor.py:149] EfficientDet BiFPN num iterations: 4
I0616 12:16:50.995934 547548639248 efficientnet_model.py:143] round_filter input=32 output=32
I0616 12:16:51.093195 547548639248 efficientnet_model.py:143] round_filter input=32 output=32
I0616 12:16:51.093668 547548639248 efficientnet_model.py:143] round_filter input=16 output=16
I0616 12:16:51.813579 547548639248 efficientnet_model.py:143] round_filter input=16 output=16
I0616 12:16:51.814042 547548639248 efficientnet_model.py:143] round_filter input=24 output=24
I0616 12:16:53.229449 547548639248 efficientnet_model.py:143] round_filter input=24 output=24
I0616 12:16:53.229925 547548639248 efficientnet_model.py:143] round_filter input=40 output=40
I0616 12:16:54.707406 547548639248 efficientnet_model.py:143] round_filter input=40 output=40
I0616 12:16:54.707888 547548639248 efficientnet_model.py:143] round_filter input=80 output=80
I0616 12:16:56.565820 547548639248 efficientnet_model.py:143] round_filter input=80 output=80
I0616 12:16:56.566341 547548639248 efficientnet_model.py:143] round_filter input=112 output=112
I0616 12:16:59.064982 547548639248 efficientnet_model.py:143] round_filter input=112 output=112
I0616 12:16:59.065442 547548639248 efficientnet_model.py:143] round_filter input=192 output=192
I0616 12:17:01.391214 547548639248 efficientnet_model.py:143] round_filter input=192 output=192
I0616 12:17:01.391669 547548639248 efficientnet_model.py:143] round_filter input=320 output=320
I0616 12:17:02.319234 547548639248 efficientnet_model.py:143] round_filter input=1280 output=1280
2022-06-16 12:17:12.486705: W tensorflow/core/common_runtime/bfc_allocator.cc:463] Allocator (GPU_0_bfc) ran out of memory trying to allocate 4.88MiB (rounded to 5120000)requested by op AddV2
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 
Current allocation summary follows.
Current allocation summary follows.
2022-06-16 12:17:12.486907: I tensorflow/core/common_runtime/bfc_allocator.cc:1010] BFCAllocator dump for GPU_0_bfc
2022-06-16 12:17:12.502571: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (256): 	Total Chunks: 168, Chunks in use: 168. 42.0KiB allocated for chunks. 42.0KiB in use in bin. 12.4KiB client-requested in use in bin.
2022-06-16 12:17:12.502689: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (512): 	Total Chunks: 153, Chunks in use: 153. 96.8KiB allocated for chunks. 96.8KiB in use in bin. 78.7KiB client-requested in use in bin.
2022-06-16 12:17:12.502731: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (1024): 	Total Chunks: 71, Chunks in use: 71. 79.0KiB allocated for chunks. 79.0KiB in use in bin. 70.9KiB client-requested in use in bin.
2022-06-16 12:17:12.502765: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (2048): 	Total Chunks: 135, Chunks in use: 135. 334.2KiB allocated for chunks. 334.2KiB in use in bin. 311.6KiB client-requested in use in bin.
2022-06-16 12:17:12.502805: I tensorflow/core/common_runtime/bfc_allocator.cc:1017] Bin (4096): 	Total Chunks: 109, Chunks in use: 109. 534.5KiB allocated for chunks. 534.5KiB in use in bin. 508.0KiB clie
2022-06-16 12:17:12.578887: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 2 Chunks of size 5120000 totalling 9.77MiB
2022-06-16 12:17:12.578924: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 10231040 totalling 9.76MiB
2022-06-16 12:17:12.578957: I tensorflow/core/common_runtime/bfc_allocator.cc:1078] Sum Total of in-use chunks: 61.87MiB
2022-06-16 12:17:12.578985: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] total_region_allocated_bytes_: 71147520 memory_limit_: 71147520 available bytes: 0 curr_region_allocation_bytes_: 142295040
2022-06-16 12:17:12.619733: I tensorflow/core/common_runtime/bfc_allocator.cc:1086] Stats: 
Limit:                        71147520
InUse:                        64879104
MaxInUse:                     64879360
NumAllocs:                        1634
MaxAllocSize:                 10231040
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2022-06-16 12:17:12.620039: W tensorflow/core/common_runtime/bfc_allocator.cc:475] *****************************************************************____************************xxxxxxx
2022-06-16 12:17:12.676088: W tensorflow/core/framework/op_kernel.cc:1733] RESOURCE_EXHAUSTED: failed to allocate memory
INFO:tensorflow:time(__main__.ModelBuilderTF2Test.test_create_ssd_models_from_config): 33.19s
I0616 12:17:12.716900 547548639248 test_util.py:2308] time(__main__.ModelBuilderTF2Test.test_create_ssd_models_from_config): 33.19s
[  FAILED  ] ModelBuilderTF2Test.test_create_ssd_models_from_config

ERROR: test_create_ssd_models_from_config (__main__.ModelBuilderTF2Test)
ModelBuilderTF2Test.test_create_ssd_models_from_config
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/object_detection-0.1-py3.6.egg/object_detection/builders/model_builder_test.py", line 211, in test_create_ssd_models_from_config
    model = model_builder.build(model_proto, is_training=True)
  File "/usr/local/lib/python3.6/dist-packages/object_detection-0.1-py3.6.egg/object_detection/builders/model_builder.py", line 1253, in build
    add_summaries)
  File "/usr/local/lib/python3.6/dist-packages/object_detection-0.1-py3.6.egg/object_detection/builders/model_builder.py", line 408, in _build_ssd_model
    is_training=is_training)
  File "/usr/local/lib/python3.6/dist-packages/object_detection-0.1-py3.6.egg/object_detection/builders/model_builder.py", line 383, in _build_ssd_feature_extractor
    return feature_extractor_class(**kwargs)
  File "/usr/local/lib/python3.6/dist-packages/object_detection-0.1-py3.6.egg/object_detection/models/ssd_efficientnet_bifpn_feature_extractor.py", line 406, in __init__
    name=name)
  File "/usr/local/lib/python3.6/dist-packages/object_detection-0.1-py3.6.egg/object_detection/models/ssd_efficientnet_bifpn_feature_extractor.py", line 171, in __init__
    model_name=self._efficientnet_version, overrides=efficientnet_overrides)
  File "/home/jonathan/Data/test/Tensorflow/models/official/legacy/image_classification/efficientnet/efficientnet_model.py", line 489, in from_name
    model = cls(config=config, overrides=overrides)
  File "/home/jonathan/Data/test/Tensorflow/models/official/legacy/image_classification/efficientnet/efficientnet_model.py", line 448, in __init__
    output = efficientnet(image_input, self.config)
  File "/home/jonathan/Data/test/Tensorflow/models/official/legacy/image_classification/efficientnet/efficientnet_model.py", line 417, in efficientnet
    x)
  File "/home/jonathan/.local/lib/python3.6/site-packages/keras/engine/base_layer.py", line 977, in __call__
    input_list)
  File "/home/jonathan/.local/lib/python3.6/site-packages/keras/engine/base_layer.py", line 1115, in _functional_construction_call
    inputs, input_masks, args, kwargs)
  File "/home/jonathan/.local/lib/python3.6/site-packages/keras/engine/base_layer.py", line 848, in _keras_tensor_symbolic_call
    return self._infer_output_signature(inputs, args, kwargs, input_masks)
  File "/home/jonathan/.local/lib/python3.6/site-packages/keras/engine/base_layer.py", line 886, in _infer_output_signature
    self._maybe_build(inputs)
  File "/home/jonathan/.local/lib/python3.6/site-packages/keras/engine/base_layer.py", line 2659, in _maybe_build
    self.build(input_shapes)  # pylint:disable=not-callable
  File "/home/jonathan/.local/lib/python3.6/site-packages/keras/layers/core.py", line 1185, in build
    trainable=True)
  File "/home/jonathan/.local/lib/python3.6/site-packages/keras/engine/base_layer.py", line 663, in add_weight
    caching_device=caching_device)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/base.py", line 821, in _add_variable_with_custom_getter
    **kwargs_for_getter)
  File "/home/jonathan/.local/lib/python3.6/site-packages/keras/engine/base_layer_utils.py", line 129, in make_variable
    shape=variable_shape if variable_shape else None)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py", line 268, in __call__
    return cls._variable_v1_call(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py", line 228, in _variable_v1_call
    shape=shape)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py", line 206, in <lambda>
    previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py", line 2626, in default_variable_creator
    shape=shape)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variables.py", line 272, in __call__
    return super(VariableMetaclass, cls).__call__(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py", line 1641, in __init__
    distribute_strategy=distribute_strategy)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/resource_variable_ops.py", line 1776, in _init_from_args
    initial_value = initial_value()
  File "/home/jonathan/.local/lib/python3.6/site-packages/keras/initializers/initializers_v2.py", line 517, in __call__
    return self._random_generator.random_uniform(shape, -limit, limit, dtype)
  File "/home/jonathan/.local/lib/python3.6/site-packages/keras/initializers/initializers_v2.py", line 973, in random_uniform
    shape=shape, minval=minval, maxval=maxval, dtype=dtype, seed=self.seed)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py", line 1096, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/random_ops.py", line 317, in random_uniform
    result = math_ops.add(result * (maxval - minval), minval, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py", line 1096, in op_dispatch_handler
    return dispatch_target(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py", line 3990, in add
    return gen_math_ops.add_v2(x, y, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 466, in add_v2
    _ops.raise_from_not_ok_status(e, name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 7107, in raise_from_not_ok_status
    raise core._status_to_exception(e) from None  # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.ResourceExhaustedError: failed to allocate memory [Op:AddV2]

----------------------------------------------------------------------
Ran 24 tests in 66.752s

FAILED (errors=1, skipped=1)

Hi,

The error is RESOURCE_EXHAUSTED.
So it fails due to being out of memory.

You can confirm this by running tegrastats in another console at the same time.

$ sudo tegrastats

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.