LPRNet Error

mainak1 · June 3, 2024, 10:52am

• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here) : 5.3.0

I’m getting following error while using tao train

WARNING: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/lprnet/scripts/train.py:82: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/lprnet/scripts/train.py", line 366, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/lprnet/scripts/train.py", line 362, in main
    raise e
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/lprnet/scripts/train.py", line 345, in main
    run_experiment(config_path=args.experiment_spec_file,
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/lprnet/scripts/train.py", line 86, in run_experiment
    os.makedirs(results_dir)
  File "/usr/lib/python3.8/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib/python3.8/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib/python3.8/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  [Previous line repeated 3 more times]
  File "/usr/lib/python3.8/os.py", line 223, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/mainak'
Execution status: FAIL
2024-06-03 15:32:19,782 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

here’s the detailed tao tool kit info:

task_group:         
    model:             
        dockers:                 
            nvidia/tao/tao-toolkit:                     
                5.0.0-tf2.11.0:                         
                    docker_registry: nvcr.io
                    tasks: 
                        1. classification_tf2
                        2. efficientdet_tf2
                5.0.0-tf1.15.5:                         
                    docker_registry: nvcr.io
                    tasks: 
                        1. bpnet
                        2. classification_tf1
                        3. converter
                        4. detectnet_v2
                        5. dssd
                        6. efficientdet_tf1
                        7. faster_rcnn
                        8. fpenet
                        9. lprnet
                        10. mask_rcnn
                        11. multitask_classification
                        12. retinanet
                        13. ssd
                        14. unet
                        15. yolo_v3
                        16. yolo_v4
                        17. yolo_v4_tiny
                5.3.0-pyt:                         
                    docker_registry: nvcr.io
                    tasks: 
                        1. action_recognition
                        2. centerpose
                        3. deformable_detr
                        4. dino
                        5. mal
                        6. ml_recog
                        7. ocdnet
                        8. ocrnet
                        9. optical_inspection
                        10. pointpillars
                        11. pose_classification
                        12. re_identification
                        13. visual_changenet
                        14. classification_pyt
                        15. segformer
    dataset:             
        dockers:                 
            nvidia/tao/tao-toolkit:                     
                5.3.0-data-services:                         
                    docker_registry: nvcr.io
                    tasks: 
                        1. augmentation
                        2. auto_label
                        3. annotations
                        4. analytics
    deploy:             
        dockers:                 
            nvidia/tao/tao-toolkit:                     
                5.3.0-deploy:                         
                    docker_registry: nvcr.io
                    tasks: 
                        1. visual_changenet
                        2. centerpose
                        3. classification_pyt
                        4. classification_tf1
                        5. classification_tf2
                        6. deformable_detr
                        7. detectnet_v2
                        8. dino
                        9. dssd
                        10. efficientdet_tf1
                        11. efficientdet_tf2
                        12. faster_rcnn
                        13. lprnet
                        14. mask_rcnn
                        15. ml_recog
                        16. multitask_classification
                        17. ocdnet
                        18. ocrnet
                        19. optical_inspection
                        20. retinanet
                        21. segformer
                        22. ssd
                        23. trtexec
                        24. unet
                        25. yolo_v3
                        26. yolo_v4
                        27. yolo_v4_tiny
format_version: 3.0
toolkit_version: 5.3.0
published_date: 03/14/2024

here’s the tao_mounts.json:

{
    "Mounts": [
        {
            "source": "/home/mainak/ms/getting_started_v5.3.0/notebooks/tao_launcher_starter_kit/lprnet",
            "destination": "/workspace/tao-experiments"
        },
        {
            "source": "/home/mainak/ms/getting_started_v5.3.0/notebooks/tao_launcher_starter_kit/lprnet/specs",
            "destination": "/workspace/tao-experiments/lprnet/specs"
        }
    ],
    "DockerOptions": {
        "user": "1000:1000"
    }
}

I run using:

tao model lprnet train --gpus=1 -e /home/mainak/ms/getting_started_v5.3.0/notebooks/tao_launcher_starter_kit/lprnet/specs/tutorial_spec.txt -k nvidia_tlt -r /home/mainak/ms/getting_started_v5.3.0/notebooks/tao_launcher_starter_kit/lprnet/experiment_dir_unpruned -m /home/mainak/ms/getting_started_v5.3.0/notebooks/tao_launcher_starter_kit/lprnet/lprnet_vtrainable_v1.0/us_lprnet_baseline18_trainable.tlt

Any help is highly appreciated
@Morganh

Morganh · June 3, 2024, 4:44pm

The path should be a path inside the docker. That means, the path defined in “destination” of the tao_mounts.json file.

mainak1 · June 4, 2024, 5:49am

I’m sorry for being naive but in the ipynb file it’s given as:

The following notebook requires the user to set an env variable called the $LOCAL_PROJECT_DIR as the path to the users workspace. Please note that the dataset to run this notebook is expected to reside in the $LOCAL_PROJECT_DIR/data, while the TAO experiment generated collaterals will be output to $LOCAL_PROJECT_DIR/lprnet.

!tao model lprnet train --gpus=1 --gpu_index=$GPU_INDEX \
                  -e $SPECS_DIR/tutorial_spec.txt \
                  -k $KEY \
                  -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
                  -m $USER_EXPERIMENT_DIR/pretrained_lprnet_baseline18/lprnet_vtrainable_v1.0/us_lprnet_baseline18_trainable.tlt

and

%env USER_EXPERIMENT_DIR=/workspace/tao-experiments/lprnet

This USER_EXPERIMENT_DIR is path to my local system or docker? . Can you please elaborate?

Morganh · June 4, 2024, 7:05am

In tao_tutorials/notebooks/tao_launcher_starter_kit/lprnet/lprnet.ipynb at main · NVIDIA/tao_tutorials · GitHub, the USER_EXPERIMENT_DIR is a path inside the docker.
You can also check tao_mounts.json file as well. It mounts local source to docker’s destination.

            {
                "source": os.environ["LOCAL_PROJECT_DIR"],
                "destination": "/workspace/tao-experiments"
            },

The “destination” is a path inside the docker.

mainak1 · June 5, 2024, 5:47am

@Morganh
Hi,
I closed the topic as that particular problem was solved. However when I migrated to ec2 instance same approach gives me the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/lprnet/scripts/train.py", line 366, in <module>
    main()
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/lprnet/scripts/train.py", line 362, in main
    raise e
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/lprnet/scripts/train.py", line 345, in main
    run_experiment(config_path=args.experiment_spec_file,
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/lprnet/scripts/train.py", line 89, in run_experiment
    status_logging.StatusLogger(
  File "/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/common/logging/logging.py", line 203, in __init__
    self.l_file = open(self.log_path, "a" if append else "w")
PermissionError: [Errno 13] Permission denied: '/workspace/tao-experiments/lprnet/experiment_dir_unpruned/status.json'
Execution status: FAIL
2024-06-05 05:17:01,558 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

Any help is highly appreciated.

I run using

tao model lprnet train --gpus=1 --gpu_index=0 -e /workspace/tao-experiments/lprnet/specs/tutorial_spec.txt -k nvidia_tlt -r /workspace/tao-experiments/lprnet/experiment_dir_unpruned -m /workspace/tao-experiments/lprnet/pretrained_lprnet_baseline18/lprnet_vtrainable_v1.0/us_lprnet_baseline18_trainable.tlt

Morganh · June 5, 2024, 6:19am

Please check if it works after removing above.

mainak1 · June 5, 2024, 6:25am

Actually after installing tao-toolkit by the below step:

# first remove the old ones
sudo apt remove --purge nvidia-container-toolkit
sudo apt update
sudo apt autoremove

# check version availability
apt list -a "*nvidia-container-toolkit*"
# install 1.14.0-1
apt install nvidia-container-toolkit=1.14.0-1 nvidia-container-toolkit-base=1.14.0-1

when I check using :

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

it gives me the following error:

docker: Error response from daemon: unknown or invalid runtime name: nvidia.
See 'docker run --help'.

Morganh · June 5, 2024, 6:27am

Please install nvidia-docker.

$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update
$ sudo apt-get install -y nvidia-docker2
$ sudo pkill -SIGHUP dockerd
$ sudo systemctl restart docker.service

mainak1 · June 5, 2024, 6:34am

I did follow the steps. However the error still persists.

ubuntu@ip-172-31-7-134:~/getting_started_v5.3.0/notebooks/tao_launcher_starter_kit/lprnet$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
OK
ubuntu@ip-172-31-7-134:~/getting_started_v5.3.0/notebooks/tao_launcher_starter_kit/lprnet$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
ubuntu@ip-172-31-7-134:~/getting_started_v5.3.0/notebooks/tao_launcher_starter_kit/lprnet$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
> sudo tee /etc/apt/sources.list.d/nvidia-docker.list
deb https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/$(ARCH) /
#deb https://nvidia.github.io/libnvidia-container/experimental/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/$(ARCH) /
#deb https://nvidia.github.io/nvidia-container-runtime/experimental/ubuntu18.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-docker/ubuntu18.04/$(ARCH) /
ubuntu@ip-172-31-7-134:~/getting_started_v5.3.0/notebooks/tao_launcher_starter_kit/lprnet$ sudo apt-get update
Get:1 file:/var/nv-tensorrt-repo-ubuntu2004-cuda11.4-trt8.2.5.1-ga-20220505  InRelease
Ign:1 file:/var/nv-tensorrt-repo-ubuntu2004-cuda11.4-trt8.2.5.1-ga-20220505  InRelease
Get:2 file:/var/nv-tensorrt-repo-ubuntu2004-cuda11.4-trt8.2.5.1-ga-20220505  Release [569 B]
Hit:3 http://ap-south-1.ec2.archive.ubuntu.com/ubuntu focal InRelease
Get:2 file:/var/nv-tensorrt-repo-ubuntu2004-cuda11.4-trt8.2.5.1-ga-20220505  Release [569 B]                                                                         
Get:4 http://ap-south-1.ec2.archive.ubuntu.com/ubuntu focal-updates InRelease [128 kB]                                                                               
Hit:5 http://ap-south-1.ec2.archive.ubuntu.com/ubuntu focal-backports InRelease                                                                                      
Hit:6 https://nvidia.github.io/libnvidia-container/stable/deb/amd64  InRelease                                                                                       
Hit:7 https://nvidia.github.io/libnvidia-container/experimental/deb/amd64  InRelease                                                                                 
Get:8 https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64  InRelease [1484 B]                                                                      
Hit:9 https://nvidia.github.io/nvidia-container-runtime/stable/ubuntu18.04/amd64  InRelease                                                                          
Hit:10 https://nvidia.github.io/nvidia-docker/ubuntu18.04/amd64  InRelease                                                                                           
Hit:11 https://download.docker.com/linux/ubuntu focal InRelease                                                                  
Hit:12 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease                                     
Get:14 http://ap-south-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1190 kB]
Get:15 http://ap-south-1.ec2.archive.ubuntu.com/ubuntu focal-updates/universe Translation-en [286 kB]
Hit:16 http://security.ubuntu.com/ubuntu focal-security InRelease                 
Fetched 1606 kB in 1s (1975 kB/s)               
Reading package lists... Done
ubuntu@ip-172-31-7-134:~/getting_started_v5.3.0/notebooks/tao_launcher_starter_kit/lprnet$ sudo apt-get install -y nvidia-docker2
Reading package lists... Done
Building dependency tree       
Reading state information... Done
nvidia-docker2 is already the newest version (2.14.0-1).
0 upgraded, 0 newly installed, 0 to remove and 52 not upgraded.
ubuntu@ip-172-31-7-134:~/getting_started_v5.3.0/notebooks/tao_launcher_starter_kit/lprnet$ sudo pkill -SIGHUP dockerd
ubuntu@ip-172-31-7-134:~/getting_started_v5.3.0/notebooks/tao_launcher_starter_kit/lprnet$ sudo systemctl restart docker.service
ubuntu@ip-172-31-7-134:~/getting_started_v5.3.0/notebooks/tao_launcher_starter_kit/lprnet$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
docker: Error response from daemon: unknown or invalid runtime name: nvidia.
See 'docker run --help'.
ubuntu@ip-172-31-7-134:~/getting_started_v5.3.0/notebooks/tao_launcher_starter_kit/lprnet$ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
docker: Error response from daemon: unknown or invalid runtime name: nvidia.
See 'docker run --help'.

Morganh · June 5, 2024, 6:39am

I shared my previous steps in Run TAO training probelm - #30 by Morganh. You can refer to it to narrow down.

mainak1 · June 5, 2024, 6:41am

This works!!! However when I run :

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

The error is still there. How am I able to train?

Morganh · June 5, 2024, 6:53am

I shared my previous steps in Run TAO training probelm - #30 by Morganh. You can refer to it to narrow down.

mainak1 · June 5, 2024, 6:55am

ok. I will surely check thanks

system · June 19, 2024, 6:55am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
License Plate Recognition TAO Toolkit	14	1260	July 4, 2022
LPRNet Error on Openalpr Dataset while training TAO Toolkit	18	905	October 12, 2021
Tao toolkit fails to train LPRnet model TAO Toolkit	3	557	May 6, 2022
Problem with tlt file mounting TAO Toolkit	29	2420	January 6, 2022
Error in TAO-Toolkit while training TAO Toolkit	15	1547	July 6, 2022
Tao Training failing on creating directory on a standard example TAO Toolkit tao	10	794	September 6, 2022
Error running tao container image TAO Toolkit	5	540	June 29, 2022
CLI update TAO Toolkit	14	1169	June 23, 2022
Permission denied in TAO Toolkit Container TAO Toolkit jetson	3	55	April 30, 2025
Tao inference error on LPRnet TAO Toolkit	6	1097	January 4, 2022

LPRNet Error

Related topics