According to the official example, the config used to transform a dataset to tfrecord look like this:
coco_config {
root_directory_path: "/workspace/tao-experiments/data/coco"
img_dir_names: ["val2017", "train2017"]
annotation_files: ["annotations/instances_val2017.json", "annotations/instances_train2017.json"]
num_partitions: 2
num_shards: [32, 256]
}
image_directory_path: "/workspace/tao-experiments/data/coco"
In the above example, the training and validation images are in two seperate directories. Also, the training and validation annotations are in two seperate json files.
However, there might be two other scenarios:
-
Scenario 1: The train and validation images are in a common directory:
In this case, is it possible to configure the img_dir_names as follows?
img_dir_names: [“common2017”, “common2017”] -
Scenario 2: The annotations are not already splitted:
For instance, there would only be one annotation file containing all the annotations (train and val). Is there a way thattao model <model_name> dataset-convert
takes care of this splitting out-of-the-box?