Tlt lprnet can't find the spec file

Hi, when I do inference with tlt lprnet, it can’t find the spec file. These are my files and command.

My tutorial_spec.txt is
random_seed: 42
lpr_config {
hidden_units: 512
max_label_length: 8
arch: “baseline”
nlayers: 18 #setting nlayers to be 10 to use baseline10 model
}
training_config {
batch_size_per_gpu: 32
num_epochs: 24
learning_rate {
soft_start_annealing_schedule {
min_learning_rate: 1e-6
max_learning_rate: 1e-5
soft_start: 0.001
annealing: 0.5
}
}
regularizer {
type: L2
weight: 5e-4
}
}
eval_config {
validation_period_during_training: 5
batch_size: 1
}
augmentation_config {
output_width: 96
output_height: 48
output_channel: 3
keep_original_prob: 0.3
transform_prob: 0.5
rotate_degree: 5
}
dataset_config {
data_sources: {
label_directory_path: “/workspace/experiments/lpr/data/train/label”
image_directory_path: “/workspace/experiments/lpr/data/train/image”
}
characters_list_file: “/workspace/experiments/lpr/us_lp_characters.txt”
validation_data_sources: {
label_directory_path: “/workspace/experiments/lpr/data/val/label”
image_directory_path: “/workspace/experiments/lpr/data/val/image”
}
}

The path to the spec file is a path inside the docker instead of your current directory.
You can run following command to check.
tlt lprnet run ls spec_file

See more in TLT Launcher — Transfer Learning Toolkit 3.0 documentation

I put the spec file in the current directory just to show all the required files are provided. All these commands are run in the container of tlt3.0.
Therefore, I don’t know why the spec file path I gave by -e is not found.
I also tried your command and it doesn’t work, either.

Again, you are running in the container of tlt3.0 docker, and you run the command tlt, that means you will run another tlt3.0 docker based on a tlt3.0 docker. So, your tutorial_spec.txt should be the path to the 2nd tlt3.0 docker.

First question, why I am running the command tlt in the container will run another tlt3.0 docker? I thought the tlt command is provide by the wheel of nvidia-pyindex and nvidia-tlt
to use the tlt3.0 more conveniently. Are you mean if I use the wheel there is no need for me to use the container?
Second, if I run all commands in the tlt3.0 container, what is the right command to train and inference? Just use the lprnet inference command and never put tlt again?

Where did you install the nvidia-pyindex and nvidia-tlt ? In a tlt3.0 docker, right? If yes, that means the tool tlt is installed in a tlt3.0 docker. When you run command tlt xxx xxx, that means you run another tlt3.0 docker based on a tlt3.0 docker.

Please see TLT Launcher — Transfer Learning Toolkit 3.0 documentation, you should mount your local directory to the docker via ~/.tlt_mounts.json

I successfully run the inference just by the command
lprnet inference --gpu_index=0 -m lpr_us_onnx_int8_new.trt -i car1.jpg -e /workspace/specs/lpr_spec.txt --trt
The lpr_spec.txt is saved in my docker path and the command is run in the docker.

But I am still comfused. In the documentation, all command shown are using tlt xxx xxx and prepare the ~/.tlt_mounts.json. I think the ~/.tlt_mounts.json is used for the tlt docker installed by pip. And I test to vim the ~/.tlt_mounts.json
{
“Mounts”: [
{
“source”: “/workspace/specs”,
“destination”: “/workspace/specs”
}
]
}
I think in this way, the tlt xxx xxx command can find the lpr_spec.txt in the second tlt3.0 docker but failed.

Question 1: What’s wrong with my operations when I run all commands in the tlt3.0 container. I edited the ~/.tlt_mounts.json in the container and the command still can’t work.
tlt lprnet inference --gpu_index=0 -m lpr_us_onnx_int8_new.trt -i car1.jpg -e /workspace/specs/lpr_spec.txt --trt
Question 2: Is my inference operation in the beginning right? In the container, I can just use lprnet inference xxx and the path is the first container file path.

Firstly, I want to ask you one question. How did you login the docker currently you work on? Which command?

docker login nvcr.io
Username: $oauthtoken
Password: my-ngckey

No, I mean, after which command you run, you can get following environment?
image

I create the docker by
sudo docker run --runtime=nvidia -it -v /var/run/docker.sock:/var/run/docker.sock -v /data/data1/username/tlt_experiments/:/workspace/ nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3 /bin/bash

I enter the docker by
sudo docker exec -it dockerID /bin/bash

OK, so, you already login a tlt3.0 docker. In this case, you can directly run the command such as
lprnet inference ...

But in tlt user guide, the tlt_mounts.json is used to map your local directory to the docker.
So, in tlt 3.0 user guide, after you install tlt launcher, end user can run every command in your host PC instead of inside the docker as you did.
For example,

morganh@ngc:~$

morganh@ngc:~$ tlt lprnet inference …

See more info in Migrating to TLT 3.0 — Transfer Learning Toolkit 3.0 documentation