Fpenet: What training parameters should be modified to enlarge the bbox

• Network Type fpenet
• TLT Version 3.22.05 cv_samples_v1.4.0
• Training spec file
experiment_spec.yaml (2.3 KB)
• How to reproduce the issue ?
I trained successfully with the default settings. Then I modified dataset_config.yaml parameter bbox_enlarge_ratio1.0 to 1.2, report an error.
How fpenet calculates bbox?
If I want to scale the bbox up or down, what training parameters should I focus on?

INFO    2022-10-31 07:28:18,040| tensorflow: Graph was finalized.
INFO    2022-10-31 07:28:19,475| tensorflow: Running local_init_op.
INFO    2022-10-31 07:28:20,933| tensorflow: Done running local_init_op.
INFO    2022-10-31 07:28:31,333| tensorflow: Saving checkpoints for step-0.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: {{function_node __inference_Dataset_map__map_func_set_random_wrapper_38447}} assertion failed: [height must be >= target + offset.]
	 [[{{node crop_to_bounding_box_5/Assert_5/Assert}}]]
	 [[IteratorGetNext_1]]
	 [[RegexFullMatch_19/_1711]]
  (1) Invalid argument: {{function_node __inference_Dataset_map__map_func_set_random_wrapper_38447}} assertion failed: [height must be >= target + offset.]
	 [[{{node crop_to_bounding_box_5/Assert_5/Assert}}]]
	 [[IteratorGetNext_1]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/scripts/train.py", line 141, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/scripts/train.py", line 137, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/trainers/fpenet_trainer.py", line 286, in train
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/trainers/fpenet_trainer.py", line 332, in run_training_loop
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 696, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1418, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1176, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  assertion failed: [height must be >= target + offset.]
	 [[{{node crop_to_bounding_box_5/Assert_5/Assert}}]]
	 [[IteratorGetNext_1]]
	 [[RegexFullMatch_19/_1711]]
  (1) Invalid argument:  assertion failed: [height must be >= target + offset.]
	 [[{{node crop_to_bounding_box_5/Assert_5/Assert}}]]
	 [[IteratorGetNext_1]]
0 successful operations.
0 derived errors ignored.
Traceback (most recent call last):
  File "/usr/local/bin/fpenet", line 8, in <module>
    sys.exit(main())
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/entrypoint/fpenet.py", line 12, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/entrypoint/entrypoint.py", line 300, in launch_job
AssertionError: Process run failed.

The fpenet will find the xmin, ymin, xmax, ymax of the points .
Then calculate a bbox.

What caused my error after enlarging bbox, I don’t understand height/target/offset mean.

To scale the bbox up or down, user can set bbox_enlarge_ratio.

In FPE DataIO pipeline, when generate tfrecords, it will gets the key points and the original image size. Then, it then gets a square encompassing all key-points and later enlarges that by bbox_enlarge_ratio.

Can you generate new tfrecords files? Also, could you try a smaller bbox_enlarge_ratio, for example, how about 1.1?

I tried smaller bbox_enlarge_ratio to 1.1 and generate new tfrecords, still reporting similar confusing errors.

INFO    2022-11-01 06:11:46,426| tensorflow: Graph was finalized.
INFO    2022-11-01 06:11:47,856| tensorflow: Running local_init_op.
INFO    2022-11-01 06:11:49,322| tensorflow: Done running local_init_op.
INFO    2022-11-01 06:11:59,795| tensorflow: Saving checkpoints for step-0.
INFO    2022-11-01 06:12:24,576| modulus.hooks.sample_counter_hook: Train Samples / sec: 2.732
INFO    2022-11-01 06:12:24,576| tensorflow: elt_loss = 41462.242, epoch = 0.0, landmarks_loss = 41462.242, step = 0, total_loss = 62193.363
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: {{function_node __inference_Dataset_map__map_func_set_random_wrapper_38447}} assertion failed: [width must be >= target + offset.]
	 [[{{node crop_to_bounding_box_27/Assert_4/Assert}}]]
	 [[IteratorGetNext_1]]
  (1) Invalid argument: {{function_node __inference_Dataset_map__map_func_set_random_wrapper_38447}} assertion failed: [width must be >= target + offset.]
	 [[{{node crop_to_bounding_box_27/Assert_4/Assert}}]]
	 [[IteratorGetNext_1]]
	 [[RegexFullMatch_61/_4286]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/scripts/train.py", line 141, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/scripts/train.py", line 137, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/trainers/fpenet_trainer.py", line 286, in train
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/trainers/fpenet_trainer.py", line 332, in run_training_loop
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 696, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1426, in run
    run_metadata=run_metadata))
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/blocks/hooks/hooks.py", line 76, in after_run
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/evaluation/fpenet_evaluator.py", line 191, in evaluate
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  assertion failed: [width must be >= target + offset.]
	 [[{{node crop_to_bounding_box_27/Assert_4/Assert}}]]
	 [[IteratorGetNext_1]]
  (1) Invalid argument:  assertion failed: [width must be >= target + offset.]
	 [[{{node crop_to_bounding_box_27/Assert_4/Assert}}]]
	 [[IteratorGetNext_1]]
	 [[RegexFullMatch_61/_4286]]
0 successful operations.
0 derived errors ignored.
Traceback (most recent call last):
  File "/usr/local/bin/fpenet", line 8, in <module>
    sys.exit(main())
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/entrypoint/fpenet.py", line 12, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/entrypoint/entrypoint.py", line 300, in launch_job
AssertionError: Process run failed.

However, I tried a new method to successfully enlarge the bbox:
Add four new dummy keypoints left_top\right_top\right_bottom\left_bottom, zoom them to the desired position, make them occluded in the annotation file and disable enable_occlusion_augmentation in the training file.
I think it works.

Suggest you to download the cv_samples_v1.4.1. In the fpenet folder, there is sample_calibration_images.py (wget --content-disposition ‘https://api.ngc.nvidia.com/v2/resources/nvidia/tao/cv_samples/versions/v1.4.1/files/fpenet/sample_calibration_images.py’)

You can see how the bbox is generated and enlarged. Then debug against your label files.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.