Retraining BodyPoseNet

• Hardware (NVIDIA GeForce RTX 2060 SUPER)
• Network Type (BodyPoseNet)
• TAO Toolkit Version (3.22.05)

Hello,
I am currently trying to do a transfer learning of BodyPoseNet on a dataset with 21 keypoints instead of 18 (to get a hand pose detection model).
I created my own dataset under COCO format and generated tfrecords files.

tao bpnet train command works fine with default files but not with my files, here is the full output:

!tao bpnet train -e $SPECS_DIR/bpnet_train_hand1_coco.yaml
-r $USER_EXPERIMENT_DIR/models/exp_m1_unpruned
-k $KEY
–gpus $NUM_GPUS
2022-07-25 09:28:34,621 [INFO] root: Registry: [‘nvcr.io’]
2022-07-25 09:28:34,663 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3
2022-07-25 09:28:34,674 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the “user”:“UID:GID” in the
DockerOptions portion of the “/home/mm/.tao_mounts.json” file. You can obtain your
users UID and GID by using the “id -u” and “id -g” commands on the
terminal.
2022-07-25 07:28:36.871415: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn’t match a supported version!
RequestsDependencyWarning)
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

2022-07-25 07:28:38,845 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn’t match a supported version!
RequestsDependencyWarning)
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

2022-07-25 07:28:41,239 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/train.py:91: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING 2022-07-25 07:28:41,239| tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/train.py:91: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/train.py:91: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

WARNING 2022-07-25 07:28:41,239| tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/train.py:91: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

/usr/local/lib/python3.6/dist-packages/driveix/bpnet/scripts/train.py:110: YAMLLoadWarning: calling yaml.load() without Loader=… is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
/workspace/tao-experiments/bpnet/models/exp_m1_unpruned
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/dataloaders/bpnet_dataloader.py:484: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

WARNING 2022-07-25 07:28:41,705| tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/dataloaders/bpnet_dataloader.py:484: The name tf.FixedLenFeature is deprecated. Please use tf.io.FixedLenFeature instead.

Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/train.py”, line 146, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/train.py”, line 132, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/modulusobject/modulusobject.py”, line 158, in deserialize_maglev_object
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/modulusobject/modulusobject.py”, line 145, in _deserialize_recursively
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/modulusobject/modulusobject.py”, line 167, in deserialize_maglev_object
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/core/build_wheel.runfiles/ai_infra/moduluspy/modulus/modulusobject/modulusobject.py”, line 432, in wrapper
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/dataloaders/bpnet_dataloader.py”, line 150, in init
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/dataloaders/processors/label_processor.py”, line 57, in init
AssertionError
Traceback (most recent call last):
File “/usr/local/bin/bpnet”, line 8, in
sys.exit(main())
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/entrypoint/bpnet.py”, line 12, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/entrypoint/entrypoint.py”, line 300, in launch_job
AssertionError: Process run failed.
2022-07-25 09:28:42,438 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

To me it seems like a wrong configuration in my dataset files, but I can’t see where it comes from.
I also checked my tfrecords file and the value of bytes_list is way bigger in the default tfrecord for coco than in mine, should I be worried about it?
Check of a part of my tfrecord file:
val_hand_tfrecord.txt (23.9 KB)
Check of a part of default tfrecord file:
val_default_tfrecord.txt (31.1 KB)

Here are the other files I used:
bpnet_train_hand1_coco.yaml (2.9 KB)
coco_hand_spec.json (2.7 KB)
act1_coco_annotations_test.json (1.4 MB)
act1_coco_annotations_train.json (1.4 MB)

Ask me if any other config file is needed, thanks in advance.

Would you please modify the title? Seems that this topic is related to bpnet instead of Gesturenet.
Thanks a lot!

1 Like

Refer to BodyPoseNet TAO training error - #4 by Morganh

  • The target_shape depends on the input shape. This can be computed based on the model stride. In the default setting, the model has a stride of 8.

The assertion error is due to

assert (image_shape[0] // target_shape[0]) == (image_shape[1] // target_shape[1])

Thank you,

I changed my input_shape to [256,256] so that I don’t have to change the target_shape.

I had another question though. In the BodyPoseNet doc I read that:
" Currently, BodyPoseNet only supports the given default skeleton configuration at pose_config_path. The inference pipelines do not support custom skeleton configuration at the moment."

Is the doc up-to-date or does that mean I can’t have a custom skeleton like the one below in the json of /model_pose_config?

Custom json in /model_pose_config:

{
“pose_config_type”: “bpnet_18_joints”,
“categories”: [
{
“supercategory”: “person”,
“id”: 1,
“name”: “person”,
“num_joints”: 21,
“keypoints”: [
“wrist”, “thumb_cmc”, “thumb_mcp”, “thumb_ip”, “thumb_tip”,
“index_finger_mcp”, “index_finger_pip”, “index_finger_dip”, “index_finger_tip”, “middle_finger_mcp”,
“middle_finger_pip”, “middle_finger_dip”, “middle_finger_tip”,“ring_finger_mcp”, “ring_finger_pip”,
“ring_finger_dip”, “ring_finger_tip”, “pinky_mcp”,“pinky_pip”,“pinky_dip”,“pinky_tip”
],
“skeleton”: [
[0,1],[1,2],[2,3],[3,4],[0,5],[5,6],[6,7],[7,8],
[9,10],[10,11],[11,12],[13,14],[14,15],[15,16],[17,18],[18,19],[19,20],[0,17],[5,9], [9,13], [13,17]
],
“skeleton_edge_names”: [
[“wrist”, “thumb_cmc”], [“thumb_cmc”, “thumb_mcp”], [“thumb_mcp”, “thumb_ip”],
[“thumb_ip”, “thumb_tip”], [“wrist”, “index_finger_mcp”], [“index_finger_mcp”, “index_finger_pip”],
[“index_finger_pip”, “index_finger_dip”], [“index_finger_dip”, “index_finger_tip”], [“middle_finger_mcp”, “middle_finger_pip”],
[“middle_finger_pip”, “middle_finger_dip”], [“middle_finger_dip”, “middle_finger_tip”], [“ring_finger_mcp”, “ring_finger_pip”],
[“ring_finger_pip”,“ring_finger_dip”], [“ring_finger_dip”,“ring_finger_tip”], [“pinky_mcp”,“pinky_pip”],
[“pinky_pip”,“pinky_dip”], [“pinky_dip”,“pinky_tip”], [“wrist”, “pinky_mcp”],
[“index_finger_mcp”, “middle_finger_mcp”], [“middle_finger_mcp”, “ring_finger_mcp”], [“ring_finger_mcp”, “pinky_mcp”]
]
}
]
}

Yes, it does not support custom skeleton configuration at the moment.

Ok then doing a transfer learning of BodyPoseNet for detecting 21 hand joints isn’t possible.
Thanks for the help and have a great weekend!