Using OCRNet from Python script

foreverneilyoung · June 12, 2024, 5:41am

• Hardware T4
• Network Type OCRNet
• TLT Version 8.6.1.6
• Training spec file: From sample notebook

I’m trying to use a re-trained OCRNet from a Python script with an OpenCV mat as input. The training was done using an unchanged ocrnet-vit.ipynb from the tao_launcher_toolkit tao_tutorials/notebooks/tao_launcher_starter_kit/ocrnet/ocrnet-vit.ipynb at main · NVIDIA/tao_tutorials · GitHub

I was able to run the inference.py script from the tao_deploy sample code tao_deploy/nvidia_tao_deploy/cv/ocrnet/scripts at main · NVIDIA/tao_deploy · GitHub

Now, this sample code deals with files, I need to run it with OpenCV mats from inside my inference script.

So I tried to put the less I understood from the entire tao_deploy kit into a new script. This script is loading the re-trained model, reads an image from disk and tries to feed that to the model. But it fails. When running the inference, tons of

[06/11/2024-19:25:44] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)

I wasn’t able to see any other sample code on the web, also unable to find any other documentation.

My question is: Would somebody be able to help me to find the problem with this small sample?

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
import cv2

# Load ensorRT-Engine
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
with open("trt.engine", "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
    engine = runtime.deserialize_cuda_engine(f.read())

# Context and stream creation
context = engine.create_execution_context()
stream = cuda.Stream()

# Bild laden und in Graustufen konvertieren
image_path = "/home/ubuntu/images/BEH6242.png"
image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)

# Load image, scale it
max_width = 200
max_height = 64
image_resized = cv2.resize(image, (max_width, max_height), interpolation=cv2.INTER_AREA)

# To numpy array
input_data = np.array(image_resized, dtype=np.float32)

# Extract input/output shapes
input_shape = engine.get_binding_shape(0)
output_shape = engine.get_binding_shape(1)

print("input_shape", input_shape)
print("output_shape", output_shape)

# Allocate cuda memory
d_input = cuda.mem_alloc(int(1 * np.prod(input_shape) * np.float32().nbytes))
d_output = cuda.mem_alloc(int(1 * np.prod(output_shape) * np.float32().nbytes))

# Copy host to cuda
cuda.memcpy_htod_async(d_input, input_data, stream)

# Run the inference
context.execute_async_v2(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)

# GPU to host
h_output = np.empty(output_shape, dtype=np.float32)
cuda.memcpy_dtoh_async(h_output, d_output, stream)
stream.synchronize()

I have put this code into test.py and running it on an AWS T4 instance with python3 test.py.

The output is:

input_shape (1, 1, 64, 200)
output_shape (1, 26)
Traceback (most recent call last):
  File "/home/ubuntu/OpenCV-dnn-samples/test.py", line 47, in <module>
    cuda.memcpy_dtoh_async(h_output, d_output, stream)
pycuda._driver.LogicError: cuMemcpyDtoHAsync failed: an illegal memory access was encountered

Then a lot of messages follow:

[06/12/2024-05:02:16] [TRT] [E] 1: [graphContext.h::~MyelinGraphContext::55] Error Code 1: Myelin (Error 700 destroying stream '0x5b3cfb2d7010'.)

If I comment the context.execute_async_v2 there is no problem with the rest of the code.

I know this code might be wrong in many ways, so please forgive me please if it is complete nonsense. :)

Morganh · June 12, 2024, 8:13am

Since you run successfully with the entire tao_deploy kit, suggest you to leverage it directly. Delete the code which are not needed. Copy the code which can be merged, etc.

foreverneilyoung · June 12, 2024, 9:21am

The original code seems to be very bloated to me. Also it deals with files, it will be hard to separate useful from useless code when using it as is… I thought there would be an easier way to achieve this.

But yes, doing that was the initial idea, before I saw the whole stuff…

foreverneilyoung · June 12, 2024, 10:41am

But the first results are so disappointing, that I really have problems to motivate myself to continue:

The tao_deploy inference.py script was running over these license plate images:

TOP17355.png (Poland)

B00288.png (Germany)

BDL1204.png (Germany)

TÜESN429.png (Germany, Tübingen)

RDFG96.png (USA, Florida)

DW3SS28.png (Poland)

BJT340E.png (Germany)

BRD1891.png (Germany)

BEH6242.png (Germany)

And all I got was this…

[06/12/2024-10:35:05] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.

/home/ubuntu/images1/TOP17355.png: stop 0.3440015912055969
/home/ubuntu/images1/B00288.png: 8 0.21826171875
/home/ubuntu/images1/BDL1204.png: b 0.197265625
/home/ubuntu/images1/TÜSN429.png: tusn429 0.4994109570980072
/home/ubuntu/images1/RDFG96.png: r 0.296875
/home/ubuntu/images1/DW3SS28.png: d 0.378662109375
/home/ubuntu/images1/BJT340E.png: bjt 0.36832505464553833
/home/ubuntu/images1/BRD1891.png: brr1991 0.05712202191352844
/home/ubuntu/images1/BEH6242.png: bb.eh6242 0.06055281683802605


2024-06-12 10:35:06,122 [TAO Toolkit] [INFO] root 82: TensorRT engine inference finished successfully.

I don’t know. Is it really just this?

foreverneilyoung · June 13, 2024, 3:37pm

@Morganh In this post you wrote that you were able to recognize the BJT340E.png plate above.

How did you do that? All I got was bjt and the other recognition did also not look very promising. I’m aware, that I still use a US trained model, but anyway…

Morganh · June 13, 2024, 3:49pm

We did not do experiments. We just confirm from the network architecture that the OCRNet_vit version can recognize the spaces. It is needed to train from scratch without existing model in Optical Character Recognition | NVIDIA NGC. Also, you can find that about 800K training images are used when OCRNet pretrained model are trained.
So, it is needed to add more training images for your cases.
Also, it is needed to check and improve training images/labels.
Trim the unexpected areas. Such as trim to below.
, label is: TOP[Space]17355

foreverneilyoung · June 13, 2024, 3:53pm

What would be required to train OCRNet from scratch besides the images and labels?

To shoot and label all the images seem to be a cumbersome job, given the fact, that this could be a perfect task for generative AI… Are you aware of such solutions?

Morganh · June 13, 2024, 3:59pm

Wait. You can still use the existing ocrnet_vit pretrained model to finetune your dataset.
User can finetune their own dataset with TAO. Reference: tao_tutorials/notebooks/tao_launcher_starter_kit/ocrnet/ocrnet-vit.ipynb at main · NVIDIA/tao_tutorials · GitHub.
Images and labels are required. More dataset is better.

foreverneilyoung · June 13, 2024, 4:10pm

This notebook is what I was using all the time, but not with my own training data.

All that back and forth with gt_new.txt and gt.txt is pretty confusing: Given a German plate set, wouldn’t it be OK to have this final character_list (all line by line of course):

ABCDEFGHIJKLMNOPQRSTUVWÖÜ[minus][space]0123456789

Or do I have to do this gt_next dance too?

My plates to also not have ~*± and stuff, could a retrained model torpedo that?

LPR is really not that bad, the crux is, that it doesn’t detect spaces…:(

Morganh · June 13, 2024, 4:15pm

Please add the characters where can be found and you want to train.
For example, lowcases? abcd…

Yes, you can also add.

foreverneilyoung · June 13, 2024, 4:21pm

“~*±”

we don’t have that on plates. As well not lower case characters.

Morganh · June 13, 2024, 4:24pm

OK.
For training images, please trim the unexpected areas.

foreverneilyoung · June 13, 2024, 4:26pm

OK, understood. And you think I could use the pre-trained models? Would I still need 800+ images/labels then?

Morganh · June 13, 2024, 4:32pm

Yes, you can use the pretrained model. For training images, you can start with several hundreds or several thousands. Split 90% as training images and 10% as validation images. Add more training images if the model is needed to improve.

foreverneilyoung · June 13, 2024, 4:39pm

Given this plate:

What could I do to prevent accidental detection of decals of authorities and TÜV (here between B and A)?

Morganh · June 13, 2024, 4:48pm

If this kind of case always happens in your dataset, you can define it in groundtruth as [special]. So label is B[special]A[space]123E.

foreverneilyoung · June 13, 2024, 5:04pm

Great. Yes it is always like so, but can look different in colour and design.

foreverneilyoung · June 13, 2024, 5:15pm

What about 2 line plates? Motorbikes have plates like this one

foreverneilyoung · June 13, 2024, 6:29pm

Would I have to train the model with grayscale images of a certain size, say 200 x 64? Since this is the input at runtime (I guess).

Morganh · June 14, 2024, 5:52pm

Need to split to two lines and label it separately.

Topic		Replies	Views
Erorr when training the model using TAO for custom action recognitinon net TAO Toolkit	21	643	July 4, 2023
HYDRA_FULL_ERROR=1 in training TAO Toolkit	10	3093	August 29, 2023
Error while training action_recognition_net with TAO TAO Toolkit	17	837	September 21, 2022
Probelm as running visual_changenet_classification on TAO launcher TAO Toolkit	41	1035	November 21, 2023
OCDNet Tao Model Zoo TAO Toolkit jetson	7	45	October 22, 2024
Error in TAO-Toolkit while training TAO Toolkit	15	1513	July 6, 2022
Training OCRNet for being used for LPD/LPR DeepStream SDK	66	859	June 4, 2024
Mix propriertary and public dataset for retrain TAO Toolkit	34	1153	March 10, 2022
Tao toolkit observations TAO Toolkit	56	984	May 29, 2024
Nvidia TensorRT PyTorch Docker :: Resnet50 Model running issues :: Jupyter TensorRT tensorrt , cuda , pytorch	3	1216	May 31, 2022

Using OCRNet from Python script

Related topics