Error using tao augment command

Hi,

I am trying to augment my data using the offline data augmentation tool. But I am getting an unknown error and the logs are not helping at all.
I have narrowed down my images folder to augment, to only two items to try to find the issue but I am getting the same error:

tao augment -d /workspace/tlt-experiments/data/images -a /workspace/tlt-experiments/data/augment_spec.yml -o /workspace/tlt-experiments/data/images_augmented -v
2023-04-18 07:20:18,493 [INFO] root: Registry: ['nvcr.io']
2023-04-18 07:20:18,601 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5
2023-04-18 07:20:18,623 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/home/ubuntu/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
2023-04-18 07:20:19.955817: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
/usr/local/lib/python3.6/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version!
  RequestsDependencyWarning)
2023-04-18 07:20:22,196 [INFO] iva.augment.spec_handler.spec_loader: Merging specification from /workspace/tlt-experiments/data/augment_spec.yml
2023-04-18 07:20:22,198 [INFO] iva.augment.scripts.augment: Serializing augmentation spec
spatial_config {
  rotation_config {
    angle: 10.0
    units: "degrees"
  }
}
output_image_width: 720
output_image_height: 720
output_image_channel: 3
image_extension: ".jpg"

2023-04-18 07:20:22,234 [INFO] iva.augment.scripts.augment: Listing files in the dataset.
2023-04-18 07:20:22,235 [DEBUG] iva.augment.scripts.augment: Time taken to list kitti files: 0.00115966796875
2023-04-18 07:20:22,236 [INFO] iva.augment.scripts.augment: Preparing file fetchers for augmentation.
2023-04-18 07:20:22,236 [DEBUG] iva.augment.scripts.augment: Defining the pipeline.
2023-04-18 07:20:22,243 [DEBUG] iva.augment.scripts.augment: Building the pipeline.
Augmentation run failed with error: Critical error when building pipeline:
Error when constructing operator: ImageDecoder encountered:
std::bad_alloc
Current pipeline object is no longer valid.
2023-04-18 07:20:23,560 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Any idea of what’s wrong?
Find attach the spec file, and the folder structure.

Thanks

• TLT Version: 4.0.1
augment_spec.yml (177 Bytes)
Screen Shot 2023-04-18 at 5.17.32 pm

Can you download notebook to check if it works?
wget --content-disposition ‘https://api.ngc.nvidia.com/v2/resources/nvidia/tao/cv_samples/versions/v1.4.1/files/augment/augment.ipynb

i.e., https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/resources/cv_samples/version/v1.4.1/files/augment/augment.ipynb

I did have it work somehow but now I am getting the error again.

I have tried in the Jupiter book and I can successfully go through all the steps (tweaked with my dummy data (1 image and 1 label txt file) until I get to the tao augment command and then I am getting the same error.

I feel like it’s a path problem somewhere because if I remove my 000000.jpg and my 000000.txt and have no files, I am getting the same error.
But the error message is so broad, I can’t understand where it’s coming from.

Any idea?

Can you share all the files with me? I am going to check if I can reproduce.

Find attached a zip file of the folder and the docker mount file
tao_mounts.json (327 Bytes)

test_tao_augment.zip (122.9 KB)

tao augment -d /workspace/tlt-experiments/data/training -a /workspace/tlt-experiments/specs/augment_spec.yml -o /workspace/tlt-experiments/data/augmented/

I didn’t get the error anymore when working with some images but then it appeared again. I thought the error could come from a non-jpg hidden file but that type of files is not present.

Thanks

Could you try to use a fresh folder and make sure there are not other hidden unexpected files?

I have already tried 3-4 times. I thought it was coming from hidden Apple files when uploading the data. But then I created the dir structure directly on the EC2 instance. The error could definitely be more specific with the path of the file failing to decode.
Note I had to create fake labels for since I am augmented a dataset for image classification

And I have done a
find ~/dataset/ -name ".*" -delete
To be sure I have no hidden files. But still the same error.

rebooting the instance did the trick and the command is now working. Not sure what went wrong and I don’t understand it. A better error message would be preferable in order to specify which wrong file is being decoded.

Thanks for the info.