Protobuf ParseError @ detectnet_v2 dataset_convert

Please provide the following information when requesting support.

• Hardware AWS TAO configured AMI
• Network Type Detectnet_v2
• TAO Version
dockers: [‘nvidia/tao/tao-toolkit-tf’, ‘nvidia/tao/tao-toolkit-pyt’, ‘nvidia/tao/tao-toolkit-lm’]
format_version: 1.0
toolkit_version: 3.21.08
published_date: 08/17/2021

I am trying to convert tfrecords from kitti dataset. I’ve split the data set into training and validation. When running

!tao detectnet_v2 dataset_convert

On training dataset, It works okay, but when trying to do so in validation dataset output the next error:

Converting Tfrecords for kitti train dataset
2021-09-29 14:06:47,231 [INFO] root: Registry: [‘nvcr.io’]
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-uuszsgr8 because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
Traceback (most recent call last):
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/dataset_convert.py”, line 104, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/dataset_convert.py”, line 93, in
File “/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/detectnet_v2/scripts/dataset_convert.py”, line 85, in main
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 735, in Merge
allow_unknown_field=allow_unknown_field)
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 803, in MergeLines
return parser.MergeLines(lines, message)
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 828, in MergeLines
self._ParseOrMerge(lines, message)
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 850, in _ParseOrMerge
self._MergeField(tokenizer, message)
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 923, in _MergeField
name = tokenizer.ConsumeIdentifierOrNumber()
File “/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py”, line 1392, in ConsumeIdentifierOrNumber
raise self.ParseError(‘Expected identifier or number, got %s.’ % result)
google.protobuf.text_format.ParseError: 12:1 : ‘​’: Expected identifier or number, got ​.
2021-09-29 14:06:56,089 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

The Kitti datasets are created the same way.

vcarsNet_tfrecords_kitti_train.txt (326 Bytes)
vcarsNet_tfrecords_kitti_validation.txt (333 Bytes)

Seems that there are unexpected format in some label files of your validation dataset.
Please try to generate part of them and find the wrong files step by step.

I’ve checked one by one with a script and the format is okay.

Is there a way to validate this in a more programatic manner?

The sum of the total number of elements per object is 15. Please check your label file if it is similar to below.
For example,

car 0.00 0 0.00 587.01 173.33 614.12 200.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00
cyclist 0.00 0 0.00 665.45 160.00 717.93 217.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00
pedestrian 0.00 0 0.00 423.17 173.67 433.17 224.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Please refer to Data Annotation Format - NVIDIA Docs