Fpenet: What training parameters should be modified to enlarge the bbox

zyann · October 31, 2022, 7:50am

• Network Type fpenet
• TLT Version 3.22.05 cv_samples_v1.4.0
• Training spec file
experiment_spec.yaml (2.3 KB)
• How to reproduce the issue ?
I trained successfully with the default settings. Then I modified dataset_config.yaml parameter bbox_enlarge_ratio1.0 to 1.2, report an error.
How fpenet calculates bbox?
If I want to scale the bbox up or down, what training parameters should I focus on?

INFO    2022-10-31 07:28:18,040| tensorflow: Graph was finalized.
INFO    2022-10-31 07:28:19,475| tensorflow: Running local_init_op.
INFO    2022-10-31 07:28:20,933| tensorflow: Done running local_init_op.
INFO    2022-10-31 07:28:31,333| tensorflow: Saving checkpoints for step-0.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: {{function_node __inference_Dataset_map__map_func_set_random_wrapper_38447}} assertion failed: [height must be >= target + offset.]
	 [[{{node crop_to_bounding_box_5/Assert_5/Assert}}]]
	 [[IteratorGetNext_1]]
	 [[RegexFullMatch_19/_1711]]
  (1) Invalid argument: {{function_node __inference_Dataset_map__map_func_set_random_wrapper_38447}} assertion failed: [height must be >= target + offset.]
	 [[{{node crop_to_bounding_box_5/Assert_5/Assert}}]]
	 [[IteratorGetNext_1]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/scripts/train.py", line 141, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/scripts/train.py", line 137, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/trainers/fpenet_trainer.py", line 286, in train
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/trainers/fpenet_trainer.py", line 332, in run_training_loop
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 696, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1418, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1176, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  assertion failed: [height must be >= target + offset.]
	 [[{{node crop_to_bounding_box_5/Assert_5/Assert}}]]
	 [[IteratorGetNext_1]]
	 [[RegexFullMatch_19/_1711]]
  (1) Invalid argument:  assertion failed: [height must be >= target + offset.]
	 [[{{node crop_to_bounding_box_5/Assert_5/Assert}}]]
	 [[IteratorGetNext_1]]
0 successful operations.
0 derived errors ignored.
Traceback (most recent call last):
  File "/usr/local/bin/fpenet", line 8, in <module>
    sys.exit(main())
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/entrypoint/fpenet.py", line 12, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/entrypoint/entrypoint.py", line 300, in launch_job
AssertionError: Process run failed.

Morganh · October 31, 2022, 8:51am

The fpenet will find the xmin, ymin, xmax, ymax of the points .
Then calculate a bbox.

zyann · October 31, 2022, 9:18am

What caused my error after enlarging bbox, I don’t understand height/target/offset mean.

Morganh · November 1, 2022, 3:33am

To scale the bbox up or down, user can set bbox_enlarge_ratio.

In FPE DataIO pipeline, when generate tfrecords, it will gets the key points and the original image size. Then, it then gets a square encompassing all key-points and later enlarges that by bbox_enlarge_ratio.

Can you generate new tfrecords files? Also, could you try a smaller bbox_enlarge_ratio, for example, how about 1.1?

zyann · November 1, 2022, 6:25am

I tried smaller bbox_enlarge_ratio to 1.1 and generate new tfrecords, still reporting similar confusing errors.

INFO    2022-11-01 06:11:46,426| tensorflow: Graph was finalized.
INFO    2022-11-01 06:11:47,856| tensorflow: Running local_init_op.
INFO    2022-11-01 06:11:49,322| tensorflow: Done running local_init_op.
INFO    2022-11-01 06:11:59,795| tensorflow: Saving checkpoints for step-0.
INFO    2022-11-01 06:12:24,576| modulus.hooks.sample_counter_hook: Train Samples / sec: 2.732
INFO    2022-11-01 06:12:24,576| tensorflow: elt_loss = 41462.242, epoch = 0.0, landmarks_loss = 41462.242, step = 0, total_loss = 62193.363
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: {{function_node __inference_Dataset_map__map_func_set_random_wrapper_38447}} assertion failed: [width must be >= target + offset.]
	 [[{{node crop_to_bounding_box_27/Assert_4/Assert}}]]
	 [[IteratorGetNext_1]]
  (1) Invalid argument: {{function_node __inference_Dataset_map__map_func_set_random_wrapper_38447}} assertion failed: [width must be >= target + offset.]
	 [[{{node crop_to_bounding_box_27/Assert_4/Assert}}]]
	 [[IteratorGetNext_1]]
	 [[RegexFullMatch_61/_4286]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/scripts/train.py", line 141, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/scripts/train.py", line 137, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/trainers/fpenet_trainer.py", line 286, in train
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/trainers/fpenet_trainer.py", line 332, in run_training_loop
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 696, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py", line 1426, in run
    run_metadata=run_metadata))
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/blocks/hooks/hooks.py", line 76, in after_run
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/evaluation/fpenet_evaluator.py", line 191, in evaluate
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  assertion failed: [width must be >= target + offset.]
	 [[{{node crop_to_bounding_box_27/Assert_4/Assert}}]]
	 [[IteratorGetNext_1]]
  (1) Invalid argument:  assertion failed: [width must be >= target + offset.]
	 [[{{node crop_to_bounding_box_27/Assert_4/Assert}}]]
	 [[IteratorGetNext_1]]
	 [[RegexFullMatch_61/_4286]]
0 successful operations.
0 derived errors ignored.
Traceback (most recent call last):
  File "/usr/local/bin/fpenet", line 8, in <module>
    sys.exit(main())
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/fpenet/entrypoint/fpenet.py", line 12, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/entrypoint/entrypoint.py", line 300, in launch_job
AssertionError: Process run failed.

However, I tried a new method to successfully enlarge the bbox:
Add four new dummy keypoints left_top\right_top\right_bottom\left_bottom, zoom them to the desired position, make them occluded in the annotation file and disable enable_occlusion_augmentation in the training file.
I think it works.

Morganh · November 1, 2022, 6:52am

Suggest you to download the cv_samples_v1.4.1. In the fpenet folder, there is sample_calibration_images.py (wget --content-disposition ‘https://api.ngc.nvidia.com/v2/resources/nvidia/tao/cv_samples/versions/v1.4.1/files/fpenet/sample_calibration_images.py’)

You can see how the bbox is generated and enlarged. Then debug against your label files.

system · November 15, 2022, 6:53am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.