Error Facing in Training command

Docker_tag:–> v3.21.08-py3
Network Type → detectnet_v2
Configuration file → spec.txt.txt (4.3 KB)

Hi,

I am trying to implement transfer learning using pretrained peoplenet model, but when I use this command

: - tao detectnet_v2 train -k tlt_encode -r /workspace/tao-experiments/results -e /workspace/tao-experiments/specs/spec.txt

to initiate training then I got one error as shown below:-

tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: /workspace/tao-experiments/images//workspace/tao-experiments/images/000000000357.jpg; No such file or directory
[[{{node AssetLoader/ReadFile}}]]
[[data_loader_out]]
(1) Not found: /workspace/tao-experiments/images//workspace/tao-experiments/images/000000000357.jpg; No such file or directory
[[{{node AssetLoader/ReadFile}}]]
[[data_loader_out]]
[[LookupTable_3/hash_table_Lookup/LookupTableFindV2/_4001]]
0 successful operations.
0 derived errors ignored.
2022-02-18 09:51:07,364 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

even this image 000000000357.jpg is present in the directory “workspace/tao-experiments/images/000000000357.jpg” . I am not able to understand this. I have also attached the configuration file for your reference.

Looking forward for help from your side.

Seems that there is something mismatching in your training spec.
Can you share the spec file when you generate tfrecord file?

Here is the spec file which i have used to create tf records:- spec3.txt (358 Bytes)

and spec file which I am using for training:-
spec.txt.txt (4.3 KB)

Please modify tfrecord spec and generate tfrecord files again.

kitti_config {
root_directory_path: “/workspace/tao-experiments/dataset”
image_dir_name: “/workspace/tao-experiments/images”
label_dir_name: “/workspace/tao-experiments/labels”
image_extension: “.jpg”
partition_mode: “random”
num_partitions: 2
val_split: 0
num_shards: 10
}
image_directory_path: “/workspace/tao-experiments/dataset”

to

kitti_config {
root_directory_path: “/workspace/tao-experiments”
image_dir_name: “images”
label_dir_name: “labels”
image_extension: “.jpg”
partition_mode: “random”
num_partitions: 2
val_split: 0
num_shards: 10
}
image_directory_path: “/workspace/tao-experiments”

Yeah its generated at root folder ok… let me try training command then i will get back to u.

Still i am facing with the same issue

tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: /workspace/tao-experiments/images/images/000000001756.jpg; No such file or directory
[[{{node AssetLoader/ReadFile}}]]
[[data_loader_out]]
(1) Not found: /workspace/tao-experiments/images/images/000000001756.jpg; No such file or directory
[[{{node AssetLoader/ReadFile}}]]
[[data_loader_out]]
[[LookupTable_3/hash_table_Lookup/LookupTableFindV2/_4001]]
0 successful operations.
0 derived errors ignored.
2022-02-18 15:09:41,701 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Not found: /workspace/tao-experiments/images/images/000000001756.jpg; No such file or directory

I am not able to understand the mapping of images. Actually the path should be /workspace/tao-experiments/images/000000001756.jpg but in the above error images is visible twice.

Can you share your latest training spec?

training spec file–> spec.txt.txt (4.3 KB)

You can modify

dataset_config {
  data_sources: {
    tfrecords_path: "/workspace/tao-experiments/tf_records_train/*"
    image_directory_path: "/workspace/tao-experiments/images"
  }

  validation_data_source: {
    tfrecords_path: "/workspace/tao-experiments/tf_records_valid/*"
    image_directory_path: "/workspace/tao-experiments/val/images"
}

to

dataset_config {
  data_sources: {
    tfrecords_path: "/workspace/tao-experiments/tf_records_train/*"
    image_directory_path: "/workspace/tao-experiments"
  }

  validation_data_source: {
    tfrecords_path: "/workspace/tao-experiments/tf_records_valid/*"
    image_directory_path: "/workspace/tao-experiments/val"
}

Spec file →
spec.txt (4.3 KB)

.tao_mounts.json file →
.tao_mounts.json.txt (1.7 KB)

I have changed in the spec file as u mentioned but now i am facing below issue: -

Terminal result →

Might be there is path miss matching with the .tao_mounts.json file. I have also attached the .tao_mounts.json file for your reference.

Similarly, please re-generate tfrecord files for your val dataset.
You can refer to above modification of training dataset.

Thanks for the help training part is running now! :)

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.