Trining TAO Toolkit results in 0.0000% accuracy

Please provide the following information when requesting support.

• Hardware (NVIDIA GeForce GTX 1650)
• Network Type (Detectnet_v2)
• Training spec file (lpd_train_resnet18_kitti.txt (3.4 KB)
Dataset used: License Plate Recognition Object Detection Dataset and Pre-Trained Model by Roboflow Universe Projects

• How to reproduce the issue ?

I wanted to train my own Licence Plate Detection System in NVIDIA TAO.

  1. I downloaded the dataset from the link above and followed the steps from the sample notebook with detecnet_v2

  2. I managed to create the training data sample, needed to clean up as some images had no labels and were able to create the tf records

  3. I installed ngc cli and could download the pretrained model

  4. I created my own training specification file (and already modified quite a few values)

→ However, I always get 0.0000% accuracy after training…

Validation cost: 0.000010
Mean average_precision (in %): 0.0000

+------------+--------------------------+
| class name | average precision (in %) |
+------------+--------------------------+
|    lpd     |           0.0            |
+------------+--------------------------+

Median Inference Time: 0.025054
2024-01-29 18:32:20,893 [TAO Toolkit] [INFO] root 2102: Evaluation metrics generated.
2024-01-29 18:32:20,893 [TAO Toolkit] [INFO] root 2102: Training loop completed.
2024-01-29 18:32:20,894 [TAO Toolkit] [INFO] root 2102: Saving trained model.
2024-01-29 18:32:21,056 [TAO Toolkit] [INFO] root 2102: Model saved.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:95: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.

But the training loss goes down form epoch to epoch


INFO:tensorflow:epoch = 0.00043122035360068997, learning_rate = 5.1002854e-07, loss = 0.08813875, step = 2 (329.480 sec)
2024-01-29 18:08:51,000 [TAO Toolkit] [INFO] tensorflow 260: epoch = 0.00043122035360068997, learning_rate = 5.1002854e-07, loss = 0.08813875, step = 2 (329.480 sec)

...

INFO:tensorflow:epoch = 0.9911599827511859, learning_rate = 5.7266834e-07, loss = 0.00061230396, step = 4597 (5.294 sec)
2024-01-29 18:30:12,241 [TAO Toolkit] [INFO] tensorflow 260: epoch = 0.9911599827511859, learning_rate = 5.7266834e-07, loss = 0.00061230396, step = 4597 (5.294 sec)


There is probably an issue with configuration file but I cannot really spot it…

There were already a similar post (Mean average precision of 0.00 for detectnet_v2 using Tao Toolkit) and I tried to follow the hints but it did not help me.

Best regards

Please refer to the lpd training spec file in deepstream_tao_apps/misc/dev_blog/LPDR/lpd/SPECS_train.txt at release/tlt3.0 · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub. It will load lpd pretrained model. See line74 deepstream_tao_apps/misc/dev_blog/LPDR/lpd/SPECS_train.txt at release/tlt3.0 · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub.
And please

Update above comment.

Hello, many thanks for the quick response.

New config file:
lpd_train_resnet18_kitti_v3.txt (3.2 KB)

I downloaded the model from here and put it in the respective folder: LPDNet | NVIDIA NGC

Unfotunately, this won’t help. The training does not even start:

2024-02-01 13:28:08,504 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']
2024-02-01 13:28:08,603 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5
2024-02-01 13:28:08,719 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
2024-02-01 12:28:09.472386: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12
2024-02-01 12:28:09,519 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
2024-02-01 12:28:10,857 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2024-02-01 12:28:10,890 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2024-02-01 12:28:10,894 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2024-02-01 12:28:12,230 [TAO Toolkit] [WARNING] matplotlib 500: Matplotlib created a temporary config/cache directory at /tmp/matplotlib-4bd30_gr because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
2024-02-01 12:28:12,456 [TAO Toolkit] [INFO] matplotlib.font_manager 1633: generated new fontManager
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
Using TensorFlow backend.
WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
2024-02-01 12:28:14,396 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
2024-02-01 12:28:14,426 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable  TF_ALLOW_IOLIBS=1.
WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2024-02-01 12:28:14,430 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable  TF_ALLOW_IOLIBS=1.
2024-02-01 12:28:15,756 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.common.logging.logging 197: Log file already exists at /workspace/tao-experiments/experiment/experiment_dir_unpruned/status.json
2024-02-01 12:28:15,756 [TAO Toolkit] [INFO] root 2102: Starting DetectNet_v2 Training job
2024-02-01 12:28:15,756 [TAO Toolkit] [INFO] __main__ 817: Loading experiment spec at /workspace/tao-experiments/experiment/specs/lpd_train_resnet18_kitti_v3.txt.
2024-02-01 12:28:15,756 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.detectnet_v2.spec_handler.spec_loader 113: Merging specification from /workspace/tao-experiments/experiment/specs/lpd_train_resnet18_kitti_v3.txt
2024-02-01 12:28:15,760 [TAO Toolkit] [INFO] root 2102: Training gridbox model.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2024-02-01 12:28:15,760 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

2024-02-01 12:28:16,977 [TAO Toolkit] [INFO] root 522: Sampling mode of the dataloader was set to user_defined.
2024-02-01 12:28:16,978 [TAO Toolkit] [INFO] __main__ 99: Cannot iterate over exactly 18551 samples with a batch size of 4; each epoch will therefore take one extra step.
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_auto_weight_hook.py:122: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

2024-02-01 12:28:16,979 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_auto_weight_hook.py:122: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_auto_weight_hook.py:125: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

2024-02-01 12:28:16,979 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_auto_weight_hook.py:125: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_auto_weight_hook.py:128: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

2024-02-01 12:28:16,982 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/detectnet_v2/cost_function/cost_auto_weight_hook.py:128: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.

2024-02-01 12:28:16,999 [TAO Toolkit] [INFO] root 2102: Building DetectNet V2 model
WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

2024-02-01 12:28:16,999 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

2024-02-01 12:28:17,001 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

WARNING:tensorflow:From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

2024-02-01 12:28:17,017 [TAO Toolkit] [WARNING] tensorflow 137: From /usr/local/lib/python3.8/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

2024-02-01 12:28:17,671 [TAO Toolkit] [INFO] __main__ 1032: Training was interrupted.
2024-02-01 12:28:17,672 [TAO Toolkit] [INFO] root 2102: Training was interrupted
Time taken to run __main__:main: 0:00:02.234440.
Execution status: PASS
2024-02-01 13:28:24,157 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.

Please add -k nvidia_tlt in the command line since the .tlt model is encrypted with key “nvidia_tlt”. See LPDNet | NVIDIA NGC.

Unfortunately, it did not work…

This is the config: lpd_train_resnet18_kitti_v3.txt (3.2 KB)

This is how it was executed:

!tao model detectnet_v2 train -e $SPECS_DIR/lpd_train_resnet18_kitti_v3.txt \
                        -r $USER_EXPERIMENT_DIR/experiment_dir_unpruned \
                        -n resnet18_detector \
                        -k nvidia_tlt \
                        --gpus $NUM_GPUS

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

From your spec file,

    key: "License_Plate"
    value: "lpd"

could you share one of your label file?
Is the class name “License_Plate” ?

If your label has a class name of “License_Plate”, you need to set below in the config file.

    key: "License_Plate"
    value: "License_Plate"

That means, the value should be the same as the actual class name.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.