PermissionError: [Errno 13] Permission denied: trying to train classification_tf1

Please provide the following information when requesting support.
Error is for the output folder
PermissionError: [Errno 13] Permission denied: ‘~/tao-getting-started_v5.2.0/notebooks/tao_launcher_starter_kit/classification_tf1/output’
Execution status: FAIL

• Hardware (T4/V100/Xavier/Nano/etc) A6000
• Network Type classification resnet and mobilenet_v2
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)
• Training spec file(If have, please share here)
• How to reproduce the issue ? try the tao train classification_tf1

I have tried many ways to fix this issue and I keep getting it. I have done chmod 777 for that folder and for higher up folders. I have tried using python 3.6.9 and 3.8.19

model_config {

Model Architecture can be chosen from:

[‘resnet’, ‘vgg’, ‘googlenet’, ‘alexnet’]

arch: “resnet”

for resnet → n_layers can be [10, 18, 50]

for vgg → n_layers can be [16, 19]

n_layers: 101
use_batch_norm: True
use_bias: False
all_projections: False
use_pooling: True
retain_head: True
resize_interpolation_method: BICUBIC

if you want to use the pretrained model,

image size should be “3,224,224”

otherwise, it can be “3, X, Y”, where X,Y >= 16

input_image_size: “3,224,224”
}
train_config {
train_dataset_path: “~/tao-getting-started_v5.2.0/notebooks/tao_launcher_starter_kit/classification_tf1/data/split/train”
val_dataset_path: “~/tao-getting-started_v5.2.0/notebooks/tao_launcher_starter_kit/classification_tf1/data/split/val”
pretrained_model_path: ~/tao-getting-started_v5.2.0/notebooks/tao_launcher_starter_kit/classification_tf1/classification_tf1/pretrained_resnet18/pretrained_classification_vresnet18/resnet_18.hdf5"

Only [‘sgd’, ‘adam’] are supported for optimizer

optimizer {
sgd {
lr: 0.01
decay: 0.0
momentum: 0.9
nesterov: False
}
}
batch_size_per_gpu: 50
n_epochs: 150

Number of CPU cores for loading data

n_workers: 16

regularizer

reg_config {
# regularizer type can be “L1”, “L2” or “None”.
type: “L2”
# if the type is not “None”,
# scope can be either “Conv2D” or “Dense” or both.
scope: “Conv2D,Dense”
# 0 < weight decay < 1
weight_decay: 0.000015
}

learning_rate

lr_config {
cosine {
learning_rate: 0.04
soft_start: 0.0
}
}
enable_random_crop: True
enable_center_crop: True
enable_color_augmentation: True
mixup_alpha: 0.2
label_smoothing: 0.1
preprocess_mode: “caffe”
image_mean {
key: ‘b’
value: 103.9
}
image_mean {
key: ‘g’
value: 116.8
}
image_mean {
key: ‘r’
value: 123.7
}
}
eval_config {
eval_dataset_path: “/path/to/your/test/data”
model_path: “/workspace/tao-experiments/classification/weights/resnet_080.tlt”
top_k: 3
batch_size: 256
n_workers: 8
enable_center_crop: True
}

Please check your ~/.tao_mounts.json file. It will map local path to docker’s path. In command line, the path should be a path inside the docker.
Can you share the full command line and full log?

Hey @Morganh, Thanks for the help.

After working with a colleague I can confirm that you and he are both correct! I had changed the paths inside the mount.json file.

For others hitting the error, please note that the instructions do not really explain how the TAO API works. The mount.json file is the most important since when you call TAO in command line it will reference that file to create a docker container and mount your folders to it. In the Collab tutorials this step is not performed, but to run locally you have to do this. The auto creation of the docker container was not clear to me until I spent a few days trying to get TAO API to run.

From original description, I think you are running TAO launcher notebooks(https://github.com/NVIDIA/tao_tutorials/tree/main/notebooks/tao_launcher_starter_kits) instead of TAO API notebooks(tao_tutorials/notebooks/tao_api_starter_kit at main · NVIDIA/tao_tutorials · GitHub). For TAO launcher notebooks(https://github.com/NVIDIA/tao_tutorials/tree/main/notebooks/tao_launcher_starter_kits), each network mentions the tao_mounts.json.

Yes you are correct. Is the TAO API notebooks the gold standard right now? I will remove the launcher start kits repo from my local. Should they be fixed?

Both are released in above links. Usually, users can train classification_tf1 with TAO launcher notebooks(https://github.com/NVIDIA/tao_tutorials/tree/main/notebooks/tao_launcher_starter_kits). TAO API notebook is used for AutoML or other cases.

1 Like

just found out, they edited the repo to be “kit” and not “kits” now

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.