Errors encountered when using TAO to train LPRnet

Diluk · November 1, 2021, 9:52am

Please provide the following information when requesting support.

• Hardware (T4/V100/Xavier/Nano/etc)
V100
• Network Type (Detectnet_v2/Faster_rcnn/Yolo_v4/LPRnet/Mask_rcnn/Classification/etc)
LPRnet
• TLT Version (Please run “tlt info --verbose” and share “docker_tag” here)

• Training spec file(If have, please share here)

################################################################################
# The MIT License (MIT)
#
# Copyright (c) 2019-2021 NVIDIA CORPORATION
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
################################################################################

random_seed: 42
lpr_config {
  hidden_units: 512
  max_label_length: 8
  arch: "baseline"
  nlayers: 18 #setting nlayers to be 10 to use baseline10 model
}
training_config {
  batch_size_per_gpu: 32
  num_epochs: 600
  learning_rate {
  soft_start_annealing_schedule {
    min_learning_rate: 1e-6
    max_learning_rate: 1e-5
    soft_start: 0.001
    annealing: 0.5
  }
  }
  regularizer {
    type: L2
    weight: 5e-4
  }
}
eval_config {
  validation_period_during_training: 5
  batch_size: 1
}
augmentation_config {
    output_width: 96
    output_height: 48
    output_channel: 3
    keep_original_prob: 0.3
}
dataset_config {
  data_sources: {
    label_directory_path: "/workspace/openalpr/data/train/label"
    image_directory_path: "/workspace/openalpr/data/train/image"
  }
  characters_list_file: "/workspace/openalpr/model/ch_lp_characters.txt"
  validation_data_sources: {
    label_directory_path: "/workspace/openalpr/data/val/label"
    image_directory_path: "/workspace/openalpr/data/val/image"
  }
  }
#    transform_prob: 0.5
#    rotate_degree: 5

• How to reproduce the issue ? (This is for errors. Please share the command line and the detailed log here.)

2021-11-01 09:20:28,676 [INFO] iva.lprnet.utils.spec_loader: Merging specification from /workspace/openalpr/tutorial_spec.txt
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 277, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 273, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 64, in run_experiment
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/utils/spec_loader.py", line 126, in load_experiment_spec
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/utils/spec_loader.py", line 106, in load_proto
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/utils/spec_loader.py", line 92, in _load_from_file
  File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 735, in Merge
    allow_unknown_field=allow_unknown_field)
  File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 803, in MergeLines
    return parser.MergeLines(lines, message)
  File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 828, in MergeLines
    self._ParseOrMerge(lines, message)
  File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 850, in _ParseOrMerge
    self._MergeField(tokenizer, message)
  File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 980, in _MergeField
    merger(tokenizer, message, field)
  File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 1055, in _MergeMessageField
    self._MergeField(tokenizer, sub_message)
  File "/usr/local/lib/python3.6/dist-packages/google/protobuf/text_format.py", line 947, in _MergeField
    (message_descriptor.full_name, name))
google.protobuf.text_format.ParseError: 57:5 : Message type "AugmentationConfig" has no field named "transform_prob".
2021-11-01 17:20:31,245 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · November 1, 2021, 10:12am

Could you refer to LPRNet — TAO Toolkit 3.22.05 documentation ?

Morganh · November 1, 2021, 10:14am

BTW, which docker did you use? Is it latest tao 3.21.08 ?

Diluk · November 2, 2021, 2:01am

Hi,Morganh:
When I used TAO to train “ch_lprnet_baseline18_trainable.tlt”, the following new error occurred:

2021-11-02 01:43:09,820 [INFO] __main__: Number of images in the training dataset:	   730
2021-11-02 01:43:09,820 [INFO] __main__: Number of images in the validation dataset:	   100
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 277, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 273, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 198, in run_experiment
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
    use_multiprocessing=use_multiprocessing)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 603, in fit
    steps_name='steps_per_epoch')
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 221, in model_iteration
    batch_data = _get_next_batch(generator)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 363, in _get_next_batch
    generator_output = next(generator)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 789, in get
    six.reraise(*sys.exc_info())
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 696, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 783, in get
    inputs = self.queue.get(block=True).get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 571, in get_index
    return _SHARED_SEQUENCES[uid][i]
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/dataloader/data_sequence.py", line 117, in __getitem__
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 0: invalid continuation byte
Epoch 1/300
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 277, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 273, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 198, in run_experiment
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
    use_multiprocessing=use_multiprocessing)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 603, in fit
    steps_name='steps_per_epoch')
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 221, in model_iteration
    batch_data = _get_next_batch(generator)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 363, in _get_next_batch
    generator_output = next(generator)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 789, in get
    six.reraise(*sys.exc_info())
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 696, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 783, in get
    inputs = self.queue.get(block=True).get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 571, in get_index
    return _SHARED_SEQUENCES[uid][i]
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/dataloader/data_sequence.py", line 117, in __getitem__
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 0: invalid continuation byte
Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 277, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 273, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 198, in run_experiment
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
    use_multiprocessing=use_multiprocessing)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 603, in fit
    steps_name='steps_per_epoch')
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 221, in model_iteration
    batch_data = _get_next_batch(generator)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 363, in _get_next_batch
    generator_output = next(generator)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 789, in get
    six.reraise(*sys.exc_info())
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 696, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 783, in get
    inputs = self.queue.get(block=True).get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 571, in get_index
    return _SHARED_SEQUENCES[uid][i]
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/dataloader/data_sequence.py", line 117, in __getitem__
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 0: invalid continuation byte
Initialize optimizer
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun.real detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[23232,1],3]
  Exit code:    1
--------------------------------------------------------------------------
2021-11-02 09:43:14,317 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

This may be caused by the Chinese characters in the label. Any ideas?

Morganh · November 2, 2021, 2:16am

How did you generate the character file?

Diluk · November 2, 2021, 2:28am

I generate the character file by keyboard input, the format refers to "tlt-experiments/lprnet/preprocess_openalpr_benchmark.py"

Morganh · November 2, 2021, 2:58am

Which docker did you run? Please share
tlt info --verbose
or
tao info --verbose

More, please share your training spec and character file as well.

Diluk · November 2, 2021, 3:08am

root@IDC_GPU_Server-1:~# tao info --verbose
Configuration of the TAO Toolkit Instance

dockers:
        nvidia/tao/tao-toolkit-tf:
                docker_registry: nvcr.io
                docker_tag: v3.21.08-py3
                tasks:
                        1. augment
                        2. bpnet
                        3. classification
                        4. detectnet_v2
                        5. dssd
                        6. emotionnet
                        7. faster_rcnn
                        8. fpenet
                        9. gazenet
                        10. gesturenet
                        11. heartratenet
                        12. lprnet
                        13. mask_rcnn
                        14. multitask_classification
                        15. retinanet
                        16. ssd
                        17. unet
                        18. yolo_v3
                        19. yolo_v4
                        20. converter
        nvidia/tao/tao-toolkit-pyt:
                docker_registry: nvcr.io
                docker_tag: v3.21.08-py3
                tasks:
                        1. speech_to_text
                        2. speech_to_text_citrinet
                        3. text_classification
                        4. question_answering
                        5. token_classification
                        6. intent_slot_classification
                        7. punctuation_and_capitalization
        nvidia/tao/tao-toolkit-lm:
                docker_registry: nvcr.io
                docker_tag: v3.21.08-py3
                tasks:
                        1. n_gram
format_version: 1.0
toolkit_version: 3.21.08
published_date: 08/17/2021

tutorial_spec.txt

random_seed: 42
lpr_config {
  hidden_units: 512
  max_label_length: 8
  arch: "baseline"
  nlayers: 18 #setting nlayers to be 10 to use baseline10 model
}
training_config {
  batch_size_per_gpu: 32
  num_epochs: 300
  learning_rate {
  soft_start_annealing_schedule {
    min_learning_rate: 1e-6
    max_learning_rate: 1e-5
    soft_start: 0.001
    annealing: 0.5
  }
  }
  regularizer {
    type: L2
    weight: 5e-4
  }
}
eval_config {
  validation_period_during_training: 5
  batch_size: 1
}
augmentation_config {
    output_width: 96
    output_height: 48
    output_channel: 3
    max_rotate_degree: 5
    rotate_prob: 0.5
    gaussian_kernel_size: 5
    gaussian_kernel_size: 7
    gaussian_kernel_size: 15
    blur_prob: 0.5
    reverse_color_prob: 0.5
    keep_original_prob: 0.3
}
dataset_config {
  data_sources: {
    label_directory_path: "/workspace/tao-experiments/data/openalpr/train/label"
    image_directory_path: "/workspace/tao-experiments/data/openalpr/train/image"
  }
  characters_list_file: "/workspace/tao-experiments/lprnet/specs/ch_lp_characters.txt"
  validation_data_sources: {
    label_directory_path: "/workspace/tao-experiments/data/openalpr/val/label"
    image_directory_path: "/workspace/tao-experiments/data/openalpr/val/image"
  }
}

ch_lp_characters.txt

皖
沪
津
渝
冀
晋
蒙
辽
吉
黑
苏
浙
京
闽
赣
鲁
豫
鄂
湘
粤
桂
琼
川
贵
云
藏
陕
甘
青
宁
新
警
学
A
B
C
D
E
F
G
H
J
K
L
M
N
P
Q
R
S
T
U
V
W
X
Y
Z
0
1
2
3
4
5
6
7
8
9

image:

label:
粤BDW9960

Morganh · November 2, 2021, 3:18am

Please vim your character file and save. Then retry.

:set nobomb

Diluk · November 2, 2021, 4:27am

I got the following new error:

Traceback (most recent call last):
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 277, in <module>
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 273, in main
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/scripts/train.py", line 198, in run_experiment
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
    use_multiprocessing=use_multiprocessing)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 603, in fit
    steps_name='steps_per_epoch')
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 221, in model_iteration
    batch_data = _get_next_batch(generator)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_generator.py", line 363, in _get_next_batch
    generator_output = next(generator)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 789, in get
    six.reraise(*sys.exc_info())
  File "/usr/local/lib/python3.6/dist-packages/six.py", line 696, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 783, in get
    inputs = self.queue.get(block=True).get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/utils/data_utils.py", line 571, in get_index
    return _SHARED_SEQUENCES[uid][i]
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/dataloader/data_sequence.py", line 118, in __getitem__
  File "/root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/lprnet/dataloader/data_sequence.py", line 118, in <listcomp>
KeyError: 'I'
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun.real detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[50162,1],3]
  Exit code:    1
--------------------------------------------------------------------------
2021-11-02 12:12:02,541 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · November 2, 2021, 6:23am

Can you upload your latest character file here?

Diluk · November 2, 2021, 6:30am

dict.txt (200 Bytes)

Morganh · November 2, 2021, 6:37am

Can you also run below command and share the result?
$ tao lprnet run cat /workspace/tao-experiments/lprnet/specs/ch_lp_characters.txt

Diluk · November 2, 2021, 6:43am

root@IDC_GPU_Server-1:/home/sutpc/xiukd/zjq/download/tlt-experiments/LPDR/lpr# tao lprnet run cat /workspace/tao-experiments/lprnet/specs/ch_lp_characters.txt
2021-11-02 14:43:00,814 [INFO] root: Registry: ['nvcr.io']
2021-11-02 14:43:01,629 [WARNING] tlt.components.docker_handler.docker_handler: 
Docker will run the commands as root. If you would like to retain your
local host permissions, please add the "user":"UID:GID" in the
DockerOptions portion of the "/root/.tao_mounts.json" file. You can obtain your
users UID and GID by using the "id -u" and "id -g" commands on the
terminal.
皖
沪
津
渝
冀
晋
蒙
辽
吉
黑
苏
浙
京
闽
赣
鲁
豫
鄂
湘
粤
桂
琼
川
贵
云
藏
陕
甘
青
宁
新
警
学
A
B
C
D
E
F
G
H
J
K
L
M
N
P
Q
R
S
T
U
V
W
X
Y
Z
0
1
2
3
4
5
6
7
8
9
2021-11-02 14:43:02,591 [INFO] tlt.components.docker_handler.docker_handler: Stopping container.

Morganh · November 2, 2021, 7:07am

Please check your images/labels. I’m afraid there are “I” in your label. So this issue happened.

Diluk · November 2, 2021, 11:47am

Hi,Morganh:
I finally found the cause of the issue, my label is not encoded in utf-8.

When I encode and convert the txt file to utf-8, the Chinese display is garbled;

Any ideas?Thanks.

Morganh · November 3, 2021, 6:52am

Can you upload your DELSP11.txt ?

Diluk · November 3, 2021, 7:03am

DELSP11.txt (9 Bytes)

Diluk · November 3, 2021, 7:26am

Hi,morganh:
Thank you for your reply. I have solved this problem. TAO supports ‘ISO-8859-1’ encoding. The reason for this problem is that my data is not enough to divide ‘batch_size_per_gpu’.

system · November 17, 2021, 7:26am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Yolo V4 Training Error TAO Toolkit	3	643	August 2, 2022
Tao Training failing on creating directory on a standard example TAO Toolkit tao	10	725	September 6, 2022
TAO toolkit happend some .so bug TAO Toolkit tao	19	903	September 9, 2022
Empty label files in custom dataset TAO Toolkit	2	364	December 15, 2022
Train with my own tlt model #2 TAO Toolkit	42	2774	February 8, 2022
License Plate Recognition TAO Toolkit	14	1227	July 4, 2022
Classification_pyt error TAO Toolkit jetson	16	72	September 18, 2024
Tao toolkit facenet Error TAO Toolkit	14	1282	March 7, 2022
TAO Toolkit Training Error TAO Toolkit	2	710	August 2, 2022
Error in TAO-Toolkit while training TAO Toolkit	15	1492	July 6, 2022

Errors encountered when using TAO to train LPRnet

Related topics