LPRNet Error on Openalpr Dataset while training

priyanshthakore · June 9, 2021, 4:32pm

Hi

Toolkit 3.0
Driver - 460
GPU - RTX 2070

Trying to train the default openalpr dataset in LPRNet but during “tlt lprnet train” giving this error

For multi-GPU, change --gpus based on your machine.
2021-06-09 21:56:40,102 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
Using TensorFlow backend.
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

2021-06-09 16:27:38,405 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:117: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2021-06-09 16:27:38,406 [WARNING] tensorflow: From /usr/local/lib/python3.6/dist-packages/horovod/tensorflow/__init__.py:143: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:56: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

2021-06-09 16:27:39,011 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:56: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:59: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2021-06-09 16:27:39,013 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:59: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.

2021-06-09 16:28:03,912 [WARNING] tensorflow: From /home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py:60: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.

2021-06-09 16:28:03,913 [INFO] /usr/local/lib/python3.6/dist-packages/iva/lprnet/utils/spec_loader.pyc: Merging specification from /workspace/tlt-experiments/lprnet/specs/tutorial_spec.txt
2021-06-09 16:28:03,925 [INFO] __main__: Loading pretrained weights. This may take a while...
Traceback (most recent call last):
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 274, in <module>
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 270, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 105, in run_experiment
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/utils/model_io.py", line 78, in load_model_as_pretrain
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/utils/model_io.py", line 41, in load_model
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/utils/model_io.py", line 29, in load_model
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/saving/save.py", line 146, in load_model
    loader_impl.parse_saved_model(filepath)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/saved_model/loader_impl.py", line 83, in parse_saved_model
    constants.SAVED_MODEL_FILENAME_PB))
OSError: SavedModel file does not exist at: /tmp/tmpzxyyngdo.hdf5/{saved_model.pbtxt|saved_model.pb}
Traceback (most recent call last):
  File "/usr/local/bin/lprnet", line 8, in <module>
    sys.exit(main())
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/entrypoint/lprnet.py", line 12, in main
  File "/home/vpraveen/.cache/dazel/_dazel_vpraveen/216c8b41e526c3295d3b802489ac2034/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/common/entrypoint/entrypoint.py", line 296, in launch_job
AssertionError: Process run failed.
2021-06-09 21:58:28,931 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · June 9, 2021, 4:38pm

Refer to LPRnet example fails to run - TLT

priyanshthakore · June 9, 2021, 4:51pm

Yes i saw this thread and ran the command line, based on that the folder seems to have the pre-trained model inside docker, below is the output

2021-06-09 22:18:27,665 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
/workspace/tlt-experiments/lprnet/pretrained_lprnet_baseline18/tlt_lprnet_vtrainable_v1.0/us_lprnet_baseline18_trainable.tlt
2021-06-09 22:18:32,324 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · June 10, 2021, 2:34am

To narrow down, could you try to login 3.0_dp docker and run training?
$ docker run --runtime=nvidia -it nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3 /bin/bash
Then, run training via
# lprnet train xxx xxx

m4x.mona · June 13, 2021, 12:08pm

Hey, I’m having the exact same issue. Can’t figure out whats the problem.
I also tried to train inside the docker as @Morganh suggested, and got the same error.

Morganh · June 13, 2021, 12:52pm

@m4x.mona Please check again according to LPRnet example fails to run - TLT - #3 by Morganh

leroren220 · June 13, 2021, 1:34pm

This is what I get when running

!tlt lprnet run ls $USER_EXPERIMENT_DIR/pretrained_lprnet_baseline18/tlt_lprnet_vtrainable_v1.0/us_lprnet_baseline18_trainable.tlt

2021-06-13 16:31:45,280 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the ~/.tlt_mounts.json file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
/workspace/tlt-experiments/lprnet/pretrained_lprnet_baseline18/tlt_lprnet_vtrainable_v1.0/us_lprnet_baseline18_trainable.tlt
2021-06-13 16:31:46,078 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · June 13, 2021, 1:38pm

@leroren220
There is no error. And actually it is expected.
The log shows that /workspace/tlt-experiments/lprnet/pretrained_lprnet_baseline18/tlt_lprnet_vtrainable_v1.0/us_lprnet_baseline18_trainable.tlt is available.

m4x.mona · June 13, 2021, 1:44pm

Hey @Morganh , All the files exist in the docker right were they should be, I get the same output as @leroren220 and @priyanshthakore when running !tlt lprnet run ls …
I still get the OSError: SavedModel file does not exist at: …

Morganh · June 13, 2021, 1:47pm

@m4x.mona
OK, I will check further. May I know the detailed steps for how to reproduce?
BTW,

which docker?
did you run with Jupyter notebook?

m4x.mona · June 13, 2021, 2:02pm

Hey

The steps to reproduce are:

Following steps 1-4 here:
Integrating TAO Models into DeepStream — TAO Toolkit 3.22.05 documentation
Following instructions at the lprnet/lprnet.ipynb notebook

The docker is nvcr.io/nvidia/tlt-streamanalytics:v3.0-dp-py3.
I also reproduced the same error on the docker without using tlt.

Morganh · June 13, 2021, 2:05pm

@m4x.mona
https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/quickstart/deepstream_integration.html
is not available, right?

m4x.mona · June 13, 2021, 2:09pm

Hmm yeah, It was available some time ago

Morganh · June 13, 2021, 2:12pm

OK, got it. Yes, the user guide has been updated due to latest release.

On my side, I just train inside the docker or outside the docker. But I cannot reproduce. The training works well.

$ tlt lprnet train -e /workspace/demo_3.0/lprnet/specs/tutorial_spec.txt -r /workspace/demo_3.0/lprnet/experiment_dir_unpruned -k nvidia_tlt -m /workspace/demo_3.0/lprnet/pretrained_lprnet_baseline18/us_lprnet_baseline18_unpruned.tlt

Could you share the full log with me? If you were running with Jupyter notebook, you can attach the .ipynb file here.

Morganh · June 13, 2021, 2:29pm

I found the root cause of yours. @m4x.mona @leroren220 @priyanshthakore

See lprnet model card https://ngc.nvidia.com/catalog/models/nvidia:tlt_lprnet
The Model load key: nvidia_tlt
So, if you were using the pretrained model in ngc, please set “-k” to nvidia_tlt instead of others.

priyanshthakore · June 13, 2021, 4:10pm

Yes just changing the key to “nvidia_tlt” instead of our key worked need to use “nvidia_tlt” throughout the notebook

m4x.mona · June 14, 2021, 7:16am

It works, thank you.

Morganh · June 15, 2021, 9:52am

Please try latest 3.0-py3 docker. It will catch such error and prompt end user to check the ngc key.

Topic		Replies	Views
Error training from scratch with character 'O' in LPRNet TAO Toolkit	14	1008	June 25, 2021
Train with my own tlt model #2 TAO Toolkit	42	2777	February 8, 2022
License Plate Recognition TAO Toolkit	14	1234	July 4, 2022
Tao toolkit facenet Error TAO Toolkit	14	1282	March 7, 2022
Tlt lprnet export error, TypeError: set_data_preprocessing_parameters() got an unexpected keyword argument 'image_mean' TAO Toolkit	7	1242	October 12, 2021
Error: google.protobuf.text_format.ParseError: 57:5 : Message type "AugmentationConfig" has no field named "transform_prob" TAO Toolkit	4	1650	October 12, 2021
Error while trying to train new data with LPRnet transfer learning TAO Toolkit	2	910	March 15, 2022
Lprnet: Failed to run the tensorrt engine verification TAO Toolkit	8	1195	October 2, 2021
TLT V2.0 Classification TAO Toolkit	26	2786	August 3, 2021
Error when trying to run gazenet notebook TAO Toolkit	21	2251	October 12, 2021

LPRNet Error on Openalpr Dataset while training

Related topics