OK, first I copied the experiments-vit.yaml
from the tao-experiments
ocrnet-vit notebook spec to ~/tao_deploy/nvidia_tao_deploy/cv/ocrnet/specs
and adjusted the paths, since the spec file is meant to be used in a container usually.
Here the current content of ~/tao_deploy/nvidia_tao_deploy/cv/ocrnet/specs/experiments-vit.yaml
:
results_dir: /home/ubuntu/tao-experiments/ocrnet/experiment_dir_retrain
encryption_key: nvidia_tao
model:
TPS: True
backbone: FAN_tiny_2X
sequence: BiLSTM
hidden_size: 256
prediction: Attn
quantize: False
input_width: 200
input_height: 64
input_channel: 1
dataset:
train_dataset_dir: []
val_dataset_dir: /data/test/lmdb
character_list_file: "/home/ubuntu/tao-experiments/data/ocrnet/character_list"
max_label_length: 25
batch_size: 32
workers: 4
augmentation:
keep_aspect_ratio: False
train:
seed: 1111
gpu_ids: [0]
optim:
name: "adadelta"
lr: 0.1
clip_grad_norm: 5.0
num_epochs: 10
checkpoint_interval: 2
validation_interval: 1
evaluate:
gpu_id: 0
checkpoint: "??"
test_dataset_dir: "??"
results_dir: "${results_dir}/evaluate"
prune:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/prune"
prune_setting:
mode: "amount"
amount: 0.1
granularity: 8
raw_prune_score: L1
inference:
gpu_id: 0
checkpoint: "/home/ubuntu/tao-experiments/ocrnet/experiment_dir_retrain/best_accuracy.pth"
inference_dataset_dir: "/home/ubuntu/tao-experiments/data/ocrnet/test_samples"
results_dir: "/home/ubuntu/tao-experiments/ocrnet/experiment_dir_retrain/inference"
trt_engine: "/home/ubuntu/tao-experiments/ocrnet/export/trt.engine"
batch_size: 1
input_width: 200
input_height: 64
export:
gpu_id: 0
checkpoint: "??"
results_dir: "${results_dir}/export"
dataset_convert:
input_img_dir: "??"
gt_file: "??"
results_dir: "${results_dir}/convert_dataset"
gen_trt_engine:
onnx_file: "??"
results_dir: "${results_dir}/convert_dataset"
I only adjusted the properties, which were explicitly touched by the inference.py
script in ~/tao_deploy/nvidia_tao_deploy/cv/ocrnet/scripts
.
I made some minimal changes to inference.py
, first to be able to resolve the nvidia_tao_deploy
imports and then some extra prints for showing the parameters. This is my inference.py
now:
# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""OCRNet TensorRT inference."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import logging
import os
import sys
sys.path.append("../../../../")
from nvidia_tao_deploy.cv.common.decorators import monitor_status
from nvidia_tao_deploy.cv.ocrnet.dataloader import OCRNetLoader
from nvidia_tao_deploy.cv.ocrnet.inferencer import OCRNetInferencer
from nvidia_tao_deploy.cv.common.hydra.hydra_runner import hydra_runner
from nvidia_tao_deploy.cv.ocrnet.config.default_config import ExperimentConfig
from nvidia_tao_deploy.cv.ocrnet.utils import decode_ctc, decode_attn
logging.basicConfig(format='%(asctime)s [TAO Toolkit] [%(levelname)s] %(name)s %(lineno)d: %(message)s',
level="INFO")
logger = logging.getLogger(__name__)
spec_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
@hydra_runner(
config_path=os.path.join(spec_root, "specs"),
config_name="experiment-vit", schema=ExperimentConfig
)
@monitor_status(name="ocrnet", mode="inference")
def main(cfg: ExperimentConfig) -> None:
"""Convert encrypted uff or onnx model to TRT engine."""
engine_file = cfg.inference.trt_engine
batch_size = cfg.inference.batch_size
img_dirs = cfg.inference.inference_dataset_dir
character_list_file = cfg.dataset.character_list_file
img_width = cfg.inference.input_width
img_height = cfg.inference.input_height
img_channel = cfg.model.input_channel
prediction_type = cfg.model.prediction
shape = [img_channel, img_height, img_width]
print("engine_file", engine_file)
print("batch_size", batch_size)
print("img_dirs", img_dirs)
print("character_list_file", character_list_file)
print("img_width", img_width)
print("img_height", img_height)
print("img_channel", img_channel)
print("prediction_type", prediction_type)
print("shape", shape)
ocrnet_engine = OCRNetInferencer(engine_path=engine_file,
batch_size=batch_size)
if prediction_type == "CTC":
character_list = ["CTCBlank"]
elif prediction_type == "Attn":
character_list = ["[GO]", "[s]"]
else:
raise ValueError(f"Unsupported prediction type: {prediction_type}")
with open(character_list_file, "r", encoding="utf-8") as f:
for ch in f.readlines():
ch = ch.strip()
character_list.append(ch)
inf_dl = OCRNetLoader(shape=shape,
image_dirs=[img_dirs],
batch_size=batch_size,
dtype=ocrnet_engine.inputs[0].host.dtype)
for idx, (imgs, _) in enumerate(inf_dl):
y_preds = ocrnet_engine.infer(imgs)
output_probs, output_ids = y_preds
img_paths = inf_dl.image_paths[idx * batch_size: (idx + 1) * batch_size]
assert len(output_ids) == len(output_probs) == len(img_paths)
for img_path, output_id, output_prob in zip(img_paths, output_ids, output_probs):
if prediction_type == "CTC":
text, conf = decode_ctc(output_id, output_prob, character_list=character_list)
else:
text, conf = decode_attn(output_id, output_prob, character_list=character_list)
print(f"{img_path}: {text} {conf}")
logging.info("TensorRT engine inference finished successfully.")
cfg = ExperimentConfig()
if __name__ == '__main__':
main()
I run it with python3 inference.py
in the ~/tao_deploy/nvidia_tao_deploy/cv/ocrnet/scripts
dir. Even though there are some warnings, I can see my configuration. Also, the /home/ubuntu/tao-experiments/ocrnet/export/trt.engine
is the one generated outside the Docker using the trtexec
command line shown.
The output is:
sys:1: UserWarning:
'experiment-vit' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/home/ubuntu/tao_deploy/nvidia_tao_deploy/cv/ocrnet/scripts/../../../../nvidia_tao_deploy/cv/common/hydra/hydra_runner.py:99: UserWarning:
'experiment-vit' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
_run_hydra(
/home/ubuntu/.local/lib/python3.10/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
Log file already exists at /home/ubuntu/tao-experiments/ocrnet/experiment_dir_retrain/status.json
Starting ocrnet inference.
engine_file /home/ubuntu/tao-experiments/ocrnet/export/trt.engine
batch_size 1
img_dirs /home/ubuntu/tao-experiments/data/ocrnet/test_samples
character_list_file /home/ubuntu/tao-experiments/data/ocrnet/character_list
img_width 200
img_height 64
img_channel 1
prediction_type Attn
shape [1, 64, 200]
engine-path /home/ubuntu/tao-experiments/ocrnet/export/trt.engine
[06/11/2024-16:16:43] [TRT] [E] 1: [stdArchiveReader.cpp::stdArchiveReaderInitCommon::47] Error Code 1: Serialization (Serialization assertion stdVersionRead == kSERIALIZATION_VERSION failed.Version tag does not match. Note: Current Version: 237, Serialized Engine Version: 236)
'NoneType' object has no attribute 'create_execution_context'
Error executing job with overrides: []
Traceback (most recent call last):
File "/home/ubuntu/tao_deploy/nvidia_tao_deploy/cv/ocrnet/scripts/../../../../nvidia_tao_deploy/cv/common/decorators.py", line 63, in _func
raise e
File "/home/ubuntu/tao_deploy/nvidia_tao_deploy/cv/ocrnet/scripts/../../../../nvidia_tao_deploy/cv/common/decorators.py", line 47, in _func
runner(cfg, **kwargs)
File "/home/ubuntu/tao_deploy/nvidia_tao_deploy/cv/ocrnet/scripts/inference.py", line 70, in main
ocrnet_engine = OCRNetInferencer(engine_path=engine_file,
File "/home/ubuntu/tao_deploy/nvidia_tao_deploy/cv/ocrnet/scripts/../../../../nvidia_tao_deploy/cv/ocrnet/inferencer.py", line 39, in __init__
super().__init__(engine_path)
File "/home/ubuntu/tao_deploy/nvidia_tao_deploy/cv/ocrnet/scripts/../../../../nvidia_tao_deploy/inferencer/trt_inferencer.py", line 50, in __init__
self.context = self.engine.create_execution_context()
AttributeError: 'NoneType' object has no attribute 'create_execution_context'
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Exception ignored in: <function OCRNetInferencer.__del__ at 0x742c8d0c5360>
Traceback (most recent call last):
File "/home/ubuntu/tao_deploy/nvidia_tao_deploy/cv/ocrnet/scripts/../../../../nvidia_tao_deploy/cv/ocrnet/inferencer.py", line 115, in __del__
if self.context:
AttributeError: 'OCRNetInferencer' object has no attribute 'context'
I believe, the main problem is and was this:
[06/11/2024-16:16:43] [TRT] [E] 1: [stdArchiveReader.cpp::stdArchiveReaderInitCommon::47] Error Code 1: Serialization (Serialization assertion stdVersionRead == kSERIALIZATION_VERSION failed.Version tag does not match. Note: Current Version: 237, Serialized Engine Version: 236)
The rest is just follow up errors for now.