• Hardware (3090)
• Network Type (bpnet)
• TLT Version (3.22.05 )
I am trying to retrain BodyPoseNet on a pig dataset with 27 keypoints instead of 18.
The problem happens when I try to convert my own dataset to tfrecord format.
command:
!tao bpnet dataset_convert \
-m 'train' \
-o $DATA_DIR/train \
--generate_masks \
--dataset_spec $DATA_POSE_SPECS_DIR/coco_spec_pig_27.json
error:
2022-10-19 10:14:15,316 [INFO] root: Registry: ['nvcr.io']
2022-10-19 10:14:15,358 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3
2022-10-19 10:14:15,436 [WARNING] tlt.components.docker_handler.docker_handler:
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/nxin/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
2022-10-19 02:14:16.023172: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.
2022-10-19 02:14:17,802 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
Using TensorFlow backend.
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.
2022-10-19 02:14:19,962 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/tfhooks/checkpoint_saver_hook.py:25: The name tf.train.CheckpointSaverHook is deprecated. Please use tf.estimator.CheckpointSaverHook instead.
Traceback (most recent call last):
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/dataset_convert.py", line 119, in <module>
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/scripts/dataset_convert.py", line 111, in main
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/dataio/build_converter.py", line 51, in build_converter
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/dataio/coco_converter.py", line 83, in __init__
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/dataio/coco_dataset.py", line 42, in __init__
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/dataio/coco_dataset.py", line 130, in _get_category
IndexError: list index out of range
Traceback (most recent call last):
File "/usr/local/bin/bpnet", line 8, in <module>
sys.exit(main())
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/bpnet/entrypoint/bpnet.py", line 12, in main
File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/driveix/build_wheel.runfiles/ai_infra/driveix/common/entrypoint/entrypoint.py", line 300, in launch_job
AssertionError: Process run failed.
2022-10-19 10:14:20,584 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.
file coco_spec_pig_27.json:
{
"dataset": "coco_pig_point",
"root_directory_path": "/workspace/tao-experiments/bpnet/data",
"train_data": {
"images_root_dir_path": "train2017",
"mask_root_dir_path": "train_mask2017",
"annotation_root_dir_path": "annotations/train.json"
},
"test_data": {
"images_root_dir_path": "val2017",
"mask_root_dir_path": "val_mask2017",
"annotation_root_dir_path": "annotations/val.json"
},
"duplicate_data_with_each_person_as_center": true,
"categories": [
{
"supercategory": "animal",
"id": 0,
"name": "pig",
"num_joints": 27,
"keypoints": [
"pig_point_1", "pig_point_2", "pig_point_3", "pig_point_4", "pig_point_5",
"pig_point_6", "pig_point_7", "pig_point_8", "pig_point_9", "pig_point_10",
"pig_point_11", "pig_point_12", "pig_point_13","pig_point_14", "pig_point_15",
"pig_point_16", "pig_point_17", "pig_point_18", "pig_point_19", "pig_point_20",
"pig_point_21", "pig_point_22", "pig_point_23", "pig_point_24", "pig_point_25",
"pig_point_26", "pig_point_27"
],
"skeleton": [
[0,1],[0,23],[0,19],[0,5],[0,17],[1,2],[23,24],[19,18],
[2,20],[18,20],[20,21],[21,3],[3,4],[4,5],[21,15],[15,16],[16,17],[21,6],[21,16],
[21,22],[24,25],[25,26],[22,7],[7,8],[8,9],[22,11],[11,12],[12,13],[26,7],[26,11],
[22,10],[10,9],[10,13],[6,22],[14,22],[25,6],[25,14],[24,3],[24,15]
],
"skeleton_edge_names": [
["pig_point_1", "pig_point_2"], ["pig_point_1", "pig_point_24"], ["pig_point_1", "pig_point_20"],
["pig_point_1", "pig_point_6"], ["pig_point_1", "pig_point_18"], ["pig_point_2", "pig_point_3"],
["pig_point_24", "pig_point_25"], ["pig_point_20", "pig_point_19"], ["pig_point_3", "pig_point_21"],
["pig_point_19", "pig_point_21"], ["pig_point_21", "pig_point_22"], ["pig_point_22", "pig_point_4"],
["pig_point_4", "pig_point_5"], ["pig_point_5", "pig_point_6"], ["pig_point_22", "pig_point_16"],
["pig_point_16", "pig_point_17"], ["pig_point_17", "pig_point_18"], ["pig_point_22","pig_point_7"],
["pig_point_22","pig_point_15"],["pig_point_22","pig_point_23"],["pig_point_25","pig_point_26"],
["pig_point_26","pig_point_27"],["pig_point_23","pig_point_8"],["pig_point_8","pig_point_9"],
["pig_point_9","pig_point_10"],["pig_point_23","pig_point_12"],["pig_point_12","pig_point_13"],
["pig_point_13","pig_point_14"],["pig_point_27","pig_point_8"],["pig_point_27","pig_point_12"],
["pig_point_23","pig_point_11"],["pig_point_11","pig_point_10"],["pig_point_11","pig_point_14"],
["pig_point_7","pig_point_23"],["pig_point_15","pig_point_23"],["pig_point_26","pig_point_7"],
["pig_point_26","pig_point_15"],["pig_point_25","pig_point_4"],["pig_point_25","pig_point_16"]
]
}
],
"visibility_flags": {
"value": {
"visible": 2,
"occluded": 1,
"not_labeled": 0
},
"mapping": {
"visible": "visible",
"occluded": "occluded",
"not_labeled": "not_labeled"
}
},
"data_filtering_params": {
"min_acceptable_height": 32,
"min_acceptable_width": 32,
"min_acceptable_kpts": 5,
"min_acceptable_interperson_dist_ratio": 0.3
}
}
train.json (3.7 MB)