Is it technically possible to give one image directory and/or one coco annotation file in the data config?

nasserha · September 15, 2023, 7:23pm

According to the official example, the config used to transform a dataset to tfrecord look like this:

coco_config {
  root_directory_path: "/workspace/tao-experiments/data/coco"
  img_dir_names: ["val2017", "train2017"]
  annotation_files: ["annotations/instances_val2017.json", "annotations/instances_train2017.json"]
  num_partitions: 2
  num_shards: [32, 256]
}
image_directory_path: "/workspace/tao-experiments/data/coco"

In the above example, the training and validation images are in two seperate directories. Also, the training and validation annotations are in two seperate json files.

However, there might be two other scenarios:

Scenario 1: The train and validation images are in a common directory:
In this case, is it possible to configure the img_dir_names as follows?
img_dir_names: [“common2017”, “common2017”]
Scenario 2: The annotations are not already splitted:
For instance, there would only be one annotation file containing all the annotations (train and val). Is there a way that tao model <model_name> dataset-convert takes care of this splitting out-of-the-box?

Morganh · September 16, 2023, 3:44pm

For above two scenarios, the train and validation images are in only one folder “common2017”, and also their annotations are not splitted. The dataset-convert would not split them to train and validation.
You can set

img_dir_names: ["common2017"]
annotation_files: ["annotations/common2017.json"]
num_partitions: 1
num_shards: [256]

Then, after conversion, you can split 256 tfrecords files to train part and validation part.

nasserha · September 18, 2023, 2:00pm

Thanks.
Would the dataset-convert put the data in a distributed manner in the tfrecords? are they somehow stratified by default or should I add some stratification?

Morganh · September 19, 2023, 7:01am

Refer to DetectNet_v2 - NVIDIA Docs.
For kitti_config, we can set partition_mode: "random" in order to put the data randomly in the tfrecords. More info can be found in https://github.com/NVIDIA/tao_tensorflow1_backend/blob/main/nvidia_tao_tf1/cv/detectnet_v2/dataio/kitti_converter_lib.py, .
For coco_config, details about the partition and its shard can be found in https://github.com/NVIDIA/tao_tensorflow1_backend/blob/main/nvidia_tao_tf1/cv/detectnet_v2/dataio/coco_converter_lib.py.

nasserha · September 19, 2023, 12:11pm

But this option is only available for the kitti config, not for a coco based config. In such case, I need to transform my dataset to kitti.

system · October 3, 2023, 12:11pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to organize a COCO dataset TAO Toolkit	5	382	September 12, 2023
Dataset partition mode: sequence TAO Toolkit	2	420	October 12, 2021
TAO tao detectnet_v2 dataset_convert not converting KITTI file to tfrecords even if the SPECS file are correct TAO Toolkit tao	8	494	May 23, 2022
Protobuf ParseError @ detectnet_v2 dataset_convert TAO Toolkit	4	762	October 14, 2021
Creating a separate evaluation TFRecord (PeopleNet) TAO Toolkit	6	433	October 12, 2021
Question about dataset_convert tool TAO Toolkit	3	367	October 12, 2021
TAO TF1 DetectNet_v2 Dataset_Converter Typo TAO Toolkit documentation , tao	3	648	December 24, 2023
TAO Toolkit: Problem with dataset conversion using COCO format TAO Toolkit	4	749	January 30, 2023
What should I prepare to train with custom data? TAO Toolkit	8	625	February 6, 2023
Permission denied error occurring with dataset_convert TAO Toolkit tao	37	535	March 21, 2024

Is it technically possible to give one image directory and/or one coco annotation file in the data config?

Related topics