tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found

Prepare your own dataset of https://docs.nvidia.com/tao/tao-toolkit/text/data_annotation_format.html#bodyposenet-coco-format

This is my dataset.Make according to notebook
keypoints_train.json (317.0 KB)

I encountered a mistake during training(batch_size=1):
2022-01-20 10:59:07,277 [INFO] root: Registry: [‘nvcr.io’]
2022-01-20 10:59:07,471 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3
2022-01-20 10:59:08,543 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/chenhongzhao/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2022-01-20 02:59:10.763322: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

2022-01-20 02:59:15,408 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

2022-01-20 02:59:21,195 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/train.py:91: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING 2022-01-20 02:59:21,196| tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/train.py:91: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/train.py:91: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

WARNING 2022-01-20 02:59:21,196| tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/train.py:91: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

/usr/local/lib/python3.6/dist-packages/driveix/bpnet/scripts/train.py:110: YAMLLoadWarning: calling yaml.load() without Loader=… is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
/workspace/tao-experiments/bpnet/models/exp_m1_unpruned
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/dataloaders/bpnet_dataloader.py:484: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

WARNING 2022-01-20 02:59:22,746| tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/dataloaders/bpnet_dataloader.py:484: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

INFO 2022-01-20 02:59:22,757| main: done
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING 2022-01-20 02:59:22,757| tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

/workspace/tao-experiments/bpnet/data/train-fold-000-of-001: 178
Total Samples: 178
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/dataloaders/bpnet_dataloader.py:319: The name tf.matrix_inverse is deprecated. Please use tf.linalg.inv instead.

WARNING 2022-01-20 02:59:22,848| tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/dataloaders/bpnet_dataloader.py:319: The name tf.matrix_inverse is deprecated. Please use tf.linalg.inv instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/dataloaders/bpnet_dataloader.py:224: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

WARNING 2022-01-20 02:59:22,878| tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/dataloaders/bpnet_dataloader.py:224: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead.

INFO 2022-01-20 02:59:22,893| driveix.bpnet.trainers.bpnet_trainer: Building model graph from model defintion …
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING 2022-01-20 02:59:22,895| tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING 2022-01-20 02:59:22,918| tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4115: The name tf.random_normal is deprecated. Please use tf.random.normal instead.

WARNING 2022-01-20 02:59:23,250| tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4115: The name tf.random_normal is deprecated. Please use tf.random.normal instead.

INFO 2022-01-20 02:59:23,643| driveix.bpnet.trainers.bpnet_trainer: Not first run and not finetuning experiment → Loading from latest checkpoint…
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/losses/bpnet_loss.py:120: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

WARNING 2022-01-20 02:59:23,654| tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/losses/bpnet_loss.py:120: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

INFO 2022-01-20 02:59:26,849| main: training
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:59: The name tf.train.LoggingTensorHook is deprecated. Please use tf.estimator.LoggingTensorHook instead.

WARNING 2022-01-20 02:59:26,853| tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:59: The name tf.train.LoggingTensorHook is deprecated. Please use tf.estimator.LoggingTensorHook instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:60: The name tf.train.StopAtStepHook is deprecated. Please use tf.estimator.StopAtStepHook instead.

WARNING 2022-01-20 02:59:26,853| tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:60: The name tf.train.StopAtStepHook is deprecated. Please use tf.estimator.StopAtStepHook instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:73: The name tf.train.StepCounterHook is deprecated. Please use tf.estimator.StepCounterHook instead.

WARNING 2022-01-20 02:59:26,854| tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:73: The name tf.train.StepCounterHook is deprecated. Please use tf.estimator.StepCounterHook instead.

INFO:tensorflow:Create CheckpointSaverHook.
INFO 2022-01-20 02:59:26,854| tensorflow: Create CheckpointSaverHook.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:99: The name tf.train.SummarySaverHook is deprecated. Please use tf.estimator.SummarySaverHook instead.

WARNING 2022-01-20 02:59:26,854| tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/utils.py:99: The name tf.train.SummarySaverHook is deprecated. Please use tf.estimator.SummarySaverHook instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/trainers/bpnet_trainer.py:300: The name tf.train.NanTensorHook is deprecated. Please use tf.estimator.NanTensorHook instead.

WARNING 2022-01-20 02:59:26,854| tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/trainers/bpnet_trainer.py:300: The name tf.train.NanTensorHook is deprecated. Please use tf.estimator.NanTensorHook instead.

INFO:tensorflow:Graph was finalized.
INFO 2022-01-20 02:59:36,833| tensorflow: Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmpmdw_ct22/model.ckpt-115240
INFO 2022-01-20 02:59:37,243| tensorflow: Restoring parameters from /tmp/tmpmdw_ct22/model.ckpt-115240
INFO:tensorflow:Running local_init_op.
INFO 2022-01-20 02:59:38,394| tensorflow: Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO 2022-01-20 02:59:38,531| tensorflow: Done running local_init_op.
INFO:tensorflow:Saving checkpoints for step-115240.
INFO 2022-01-20 03:00:21,579| tensorflow: Saving checkpoints for step-115240.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

WARNING 2022-01-20 03:00:33,133| tensorflow: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

Traceback (most recent call last):
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1365, in _do_call
return fn(*args)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1350, in _run_fn
target_list, run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: assertion failed: [32.3707886]
[[{{node Optimizer/Assert/AssertGuard/Assert}}]]
(1) Invalid argument: assertion failed: [32.3707886]
[[{{node Optimizer/Assert/AssertGuard/Assert}}]]
[[Model/block_3d_bn_1/AssignMovingAvg_1/_2705]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/train.py”, line 146, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/train.py”, line 137, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/trainers/bpnet_trainer.py”, line 316, in train
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/blocks/trainers/trainer.py”, line 119, in run_training_loop
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 754, in run
run_metadata=run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1360, in run
raise six.reraise(*original_exc_info)
File “/usr/local/lib/python3.6/dist-packages/six.py”, line 696, in reraise
raise value
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1345, in run
return self._sess.run(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1418, in run
run_metadata=run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/monitored_session.py”, line 1176, in run
return self._sess.run(*args, **kwargs)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 956, in run
run_metadata_ptr)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1180, in _run
feed_dict_tensor, options, run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1359, in _do_run
run_metadata)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py”, line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: assertion failed: [32.3707886]
[[node Optimizer/Assert/AssertGuard/Assert (defined at usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
(1) Invalid argument: assertion failed: [32.3707886]
[[node Optimizer/Assert/AssertGuard/Assert (defined at usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[Model/block_3d_bn_1/AssignMovingAvg_1/_2705]]
0 successful operations.
0 derived errors ignored.

Original stack trace for ‘Optimizer/Assert/AssertGuard/Assert’:
File “root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/train.py”, line 146, in
File “root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/train.py”, line 134, in main
File “root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/trainers/bpnet_trainer.py”, line 255, in build
File “root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/trainers/bpnet_trainer.py”, line 245, in _build_distributed
File “root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/optimizers/weighted_momentum_optimizer.py”, line 55, in build
File “root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/blocks/learning_rate_schedules/softstart_annealing_schedule.py”, line 114, in get_tensor
File “root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/hooks/utils.py”, line 40, in get_softstart_annealing_learning_rate
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/tf_should_use.py”, line 198, in wrapped
return _add_should_use_warning(fn(*args, **kwargs))
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/control_flow_ops.py”, line 173, in Assert
guarded_assert = cond(condition, no_op, true_assert, name=“AssertGuard”)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py”, line 513, in new_func
return func(*args, **kwargs)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/control_flow_ops.py”, line 1235, in cond
orig_res_f, res_f = context_f.BuildCondBranch(false_fn)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/control_flow_ops.py”, line 1061, in BuildCondBranch
original_result = fn()
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/control_flow_ops.py”, line 171, in true_assert
condition, data, summarize, name=“Assert”)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_logging_ops.py”, line 74, in _assert
name=name)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py”, line 794, in _apply_op_helper
op_def=op_def)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py”, line 513, in new_func
return func(*args, **kwargs)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 3357, in create_op
attrs, op_def, compute_device)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 3426, in _create_op_internal
op_def=op_def)
File “usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py”, line 1748, in init
self._traceback = tf_stack.extract_stack()

Traceback (most recent call last):
File “/usr/local/bin/bpnet”, line 8, in
sys.exit(main())
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/entrypoint/bpnet.py”, line 12, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/entrypoint/entrypoint.py”, line 300, in launch_job
AssertionError: Process run failed.
2022-01-20 11:00:47,788 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
--------------------------------
if i set batch_size=2.
The error message will change:

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: assertion failed: [64.7415771]
[[{{node Optimizer/Assert/AssertGuard/Assert}}]]
(1) Invalid argument: assertion failed: [64.7415771]
[[{{node Optimizer/Assert/AssertGuard/Assert}}]]
[[Optimizer/Exp/_2677]]

--------------------------------
it’s too hard, --!, please help me.

If you have trained the default bpnet notebook successfully, could you please compare it to narrow down?

I successfully trained the default BPnet notebook, but I used the dataset from coco2017.
If I use a custom dataset, an error will be reported。
I compared their JSON files and found nothing。
At the same time, my JSON file is designed according to dpnet notebook。
The error must have happened to my dataset, but I don’t have a clue。

Can you use a new result folder and retry?

I want to retest with other labeling software.

I tried to modify it many times, but I still failed. Can you teach me how to make this coco keypoint data set

Could you please refer to Body Pose Estimation - NVIDIA Docs ?

The BodyPoseNet apps in TAO Toolkit expect data in COCO format for training and evaluation.

See the Data Annotation Format page for more information about the COCO data format.

The Sloth and Label Studio tools may be used for labeling.

1 Like

I’ve read this document before, but I haven’t seen the recommended tools。
I’m too careless。

I’ve got the same error using the same version of tao and try to train on the tao toolkit’s cv_samples_v1.3.0 for mask rcnn.

I use the labelme tool for annotation.
Sloth and labelstudio haven’t tried yet.
What labeling tool do you use.

What labeling tools do you use

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.