TFrecord generation issue with retinanet

I am trying to train retinanet model with custom dataset. i am facing the following issues when using tfrecords.
image

issue -1. when trying to generate tfrecords with 21.08 and 21.11, the tfrecords are generated, but during training throws the error.
commands : i)docker run -itd --gpus all -v /Experiments/data:/workspace --name tfrecord_conversion_retinanet-Z_1-Jul_20_2022_06.35.28 nvcr.io/nvidia/tao/tao-toolkit-tf:v3.21.11-tf1.15.5-py3 /bin/bash
ii) retinanet dataset_convert -d /workspace/ZNT/Z_1/dataset_train_conversion_spec.txt -o /workspace/ZNT/Z_1/tfrecords/train/tfrecord
image

image

issue-2 : when trying to generate tfrecords with 22.05 it throws error:
docker run -itd --gpus all -v /Experiments/data:/workspace --name tfrecord_conversion_retinanet-Z_1-Jul_20_2022_06.53.17 --entrypoint “” nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3 /bin/bash

retinanet dataset_convert -d /workspace/ZNT/Z_1/dataset_train_conversion_spec.txt -o /workspace/ZNT/Z_1/tfrecords/train/tfrecord
image

here is my dataset_conversion_spec.txt
image

Please share the full log of below picture.

retinanet dataset_convert -d /workspace/ZNT/Z_1/dataset_train_conversion_spec.txt -o /workspace/ZNT/Z_1/tfrecords/train/tfrecord
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn’t match a supported version!
RequestsDependencyWarning)
Using TensorFlow backend.
2022-07-21 04:26:38,150 [INFO] iva.detectnet_v2.dataio.build_converter: Instantiating a kitti converter
2022-07-21 04:26:38,150 [INFO] iva.detectnet_v2.dataio.dataset_converter_lib: Creating output directory /workspace/ZNT/Z_1/tfrecords/train
2022-07-21 04:26:38,152 [INFO] iva.detectnet_v2.dataio.kitti_converter_lib: Num images in
Train: 447 Val: 111
2022-07-21 04:26:38,152 [INFO] iva.detectnet_v2.dataio.kitti_converter_lib: Validation data in partition 0. Hence, while choosing the validationset during training choose validation_fold 0.
2022-07-21 04:26:38,153 [INFO] iva.detectnet_v2.dataio.dataset_converter_lib: Writing partition 0, shard 0
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/dataset_converter_lib.py:169: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

2022-07-21 04:26:38,153 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/dataset_converter_lib.py:169: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:315: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/dataset_convert.py”, line 18, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/scripts/dataset_convert.py”, line 164, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/dataset_converter_lib.py”, line 74, in convert
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/dataset_converter_lib.py”, line 129, in _write_partitions
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/dataset_converter_lib.py”, line 173, in _write_shard
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/kitti_converter_lib.py”, line 207, in _create_example_proto
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/kitti_converter_lib.py”, line 344, in _add_targets
TypeError: argument of type ‘NoneType’ is not iterable
retinanet dataset_convert -d /workspace/ZNT/Z_1/dataset_val_conversion_spec.txt -o /workspace/ZNT/Z_1/tfrecords/val/tfrecord
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/init.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn’t match a supported version!
RequestsDependencyWarning)
Using TensorFlow backend.
2022-07-21 04:26:44,462 [INFO] iva.detectnet_v2.dataio.build_converter: Instantiating a kitti converter
2022-07-21 04:26:44,463 [INFO] iva.detectnet_v2.dataio.dataset_converter_lib: Creating output directory /workspace/ZNT/Z_1/tfrecords/val
2022-07-21 04:26:44,467 [INFO] iva.detectnet_v2.dataio.kitti_converter_lib: Num images in
Train: 1267 Val: 316
2022-07-21 04:26:44,467 [INFO] iva.detectnet_v2.dataio.kitti_converter_lib: Validation data in partition 0. Hence, while choosing the validationset during training choose validation_fold 0.
2022-07-21 04:26:44,468 [INFO] iva.detectnet_v2.dataio.dataset_converter_lib: Writing partition 0, shard 0
WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/dataset_converter_lib.py:169: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

2022-07-21 04:26:44,468 [WARNING] tensorflow: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/dataset_converter_lib.py:169: The name tf.python_io.TFRecordWriter is deprecated. Please use tf.io.TFRecordWriter instead.

/usr/local/lib/python3.6/dist-packages/iva/detectnet_v2/dataio/kitti_converter_lib.py:315: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/retinanet/scripts/dataset_convert.py”, line 18, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/ssd/scripts/dataset_convert.py”, line 164, in main
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/dataset_converter_lib.py”, line 74, in convert
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/dataset_converter_lib.py”, line 129, in _write_partitions
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/dataset_converter_lib.py”, line 173, in _write_shard
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/kitti_converter_lib.py”, line 207, in _create_example_proto
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/dataio/kitti_converter_lib.py”, line 344, in _add_targets
TypeError: argument of type ‘NoneType’ is not iterable
Stopping container tfrecord_conversion_retinanet-Z_1-Jul_21_2022_00.26.32

Can you share several label files? How many classes?

labels.zip (620 Bytes)
classes: 3

When working with YOLO_v4_tiny in 22.05, the same dataset works fine.

Could you double check? It does not make sense yolov4_tiny works but retinannet does not.
They are using the same converter.

Could you share these images(should be 3 images) with me as well?

As per the company norms, I can not share the REAL data_images . But here are some dummy data with which i am able to generate tfrecords and training successfully using yolo_v4_tiny


data.zip (81.1 KB)

For the 1st image, its resolution is 275x183. But the bbox’s coordinates are higher. Is it something wrong?
r_1 0.00 0 0.00 458 234 587 337 0.00 0.00 0.00 0.00 0.00 0.00 0.00

this is dummy data and not the actual one

but still able to produce tfrecords and train yolo_v4_tiny as shown in the image above

Hi,
For retinanet, please refer to the sample retinanet_tfrecords_kitti_train.txt as below.
This file can be found after you download the notebook files.
https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_quick_start_guide.html#computer-vision
Note: the latest version is 1.4.1. (TAO Toolkit Computer Vision Sample Workflows | NVIDIA NGC)

# cat retinanet_tfrecords_kitti_train.txt
kitti_config {
  root_directory_path: "/workspace/tao-experiments/data/"
  image_dir_name: "training/image_2"
  label_dir_name: "training/label_2"
  image_extension: ".png"
  partition_mode: "random"
  num_partitions: 2
  val_split: 0
  num_shards: 10
}
image_directory_path: "/workspace/tao-experiments/data/"
  target_class_mapping {
      key: "car"
      value: "car"
  }
  target_class_mapping {
      key: "pedestrian"
      value: "pedestrian"
  }
  target_class_mapping {
      key: "cyclist"
      value: "cyclist"
  }
  target_class_mapping {
      key: "van"
      value: "car"
  }
  target_class_mapping {
      key: "person_sitting"
      value: "pedestrian"
  }
  target_class_mapping {
      key: "truck"
      value: "car"
  }

There is no update from you for a period, assuming this is not an issue anymore.
Hence we are closing this topic. If need further support, please open a new one.
Thanks