Using OCRNet from Python script

• Hardware T4
• Network Type OCRNet
• TLT Version 8.6.1.6
• Training spec file: From sample notebook

I’m trying to use a re-trained OCRNet from a Python script with an OpenCV mat as input. The training was done using an unchanged ocrnet-vit.ipynb from the tao_launcher_toolkit tao_tutorials/notebooks/tao_launcher_starter_kit/ocrnet/ocrnet-vit.ipynb at main · NVIDIA/tao_tutorials · GitHub

I was able to run the inference.py script from the tao_deploy sample code tao_deploy/nvidia_tao_deploy/cv/ocrnet/scripts at main · NVIDIA/tao_deploy · GitHub

Now, this sample code deals with files, I need to run it with OpenCV mats from inside my inference script.

So I tried to put the less I understood from the entire tao_deploy kit into a new script. This script is loading the re-trained model, reads an image from disk and tries to feed that to the model. But it fails. When running the inference, tons of

[06/11/2024-19:25:44] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)

I wasn’t able to see any other sample code on the web, also unable to find any other documentation.

My question is: Would somebody be able to help me to find the problem with this small sample?

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
import cv2

# Load ensorRT-Engine
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
with open("trt.engine", "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
    engine = runtime.deserialize_cuda_engine(f.read())

# Context and stream creation
context = engine.create_execution_context()
stream = cuda.Stream()

# Bild laden und in Graustufen konvertieren
image_path = "/home/ubuntu/images/BEH6242.png"
image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)

# Load image, scale it
max_width = 200
max_height = 64
image_resized = cv2.resize(image, (max_width, max_height), interpolation=cv2.INTER_AREA)

# To numpy array
input_data = np.array(image_resized, dtype=np.float32)

# Extract input/output shapes
input_shape = engine.get_binding_shape(0)
output_shape = engine.get_binding_shape(1)

print("input_shape", input_shape)
print("output_shape", output_shape)

# Allocate cuda memory
d_input = cuda.mem_alloc(int(1 * np.prod(input_shape) * np.float32().nbytes))
d_output = cuda.mem_alloc(int(1 * np.prod(output_shape) * np.float32().nbytes))

# Copy host to cuda
cuda.memcpy_htod_async(d_input, input_data, stream)

# Run the inference
context.execute_async_v2(bindings=[int(d_input), int(d_output)], stream_handle=stream.handle)

# GPU to host
h_output = np.empty(output_shape, dtype=np.float32)
cuda.memcpy_dtoh_async(h_output, d_output, stream)
stream.synchronize()

I have put this code into test.py and running it on an AWS T4 instance with python3 test.py.

The output is:

input_shape (1, 1, 64, 200)
output_shape (1, 26)
Traceback (most recent call last):
  File "/home/ubuntu/OpenCV-dnn-samples/test.py", line 47, in <module>
    cuda.memcpy_dtoh_async(h_output, d_output, stream)
pycuda._driver.LogicError: cuMemcpyDtoHAsync failed: an illegal memory access was encountered

Then a lot of messages follow:

[06/12/2024-05:02:16] [TRT] [E] 1: [graphContext.h::~MyelinGraphContext::55] Error Code 1: Myelin (Error 700 destroying stream '0x5b3cfb2d7010'.)

If I comment the context.execute_async_v2 there is no problem with the rest of the code.

I know this code might be wrong in many ways, so please forgive me please if it is complete nonsense. :)

Since you run successfully with the entire tao_deploy kit, suggest you to leverage it directly. Delete the code which are not needed. Copy the code which can be merged, etc.

The original code seems to be very bloated to me. Also it deals with files, it will be hard to separate useful from useless code when using it as is… I thought there would be an easier way to achieve this.

But yes, doing that was the initial idea, before I saw the whole stuff…

But the first results are so disappointing, that I really have problems to motivate myself to continue:

The tao_deploy inference.py script was running over these license plate images:

TOP17355.png (Poland)
image

B00288.png (Germany)
image

BDL1204.png (Germany)
image

TÜESN429.png (Germany, Tübingen)
image

RDFG96.png (USA, Florida)
image

DW3SS28.png (Poland)
image

BJT340E.png (Germany)
image

BRD1891.png (Germany)
image

BEH6242.png (Germany)
image

And all I got was this…

[06/12/2024-10:35:05] [TRT] [W] The getMaxBatchSize() function should not be used with an engine built from a network created with NetworkDefinitionCreationFlag::kEXPLICIT_BATCH flag. This function will always return 1.

/home/ubuntu/images1/TOP17355.png: stop 0.3440015912055969
/home/ubuntu/images1/B00288.png: 8 0.21826171875
/home/ubuntu/images1/BDL1204.png: b 0.197265625
/home/ubuntu/images1/TÜSN429.png: tusn429 0.4994109570980072
/home/ubuntu/images1/RDFG96.png: r 0.296875
/home/ubuntu/images1/DW3SS28.png: d 0.378662109375
/home/ubuntu/images1/BJT340E.png: bjt 0.36832505464553833
/home/ubuntu/images1/BRD1891.png: brr1991 0.05712202191352844
/home/ubuntu/images1/BEH6242.png: bb.eh6242 0.06055281683802605


2024-06-12 10:35:06,122 [TAO Toolkit] [INFO] root 82: TensorRT engine inference finished successfully.

I don’t know. Is it really just this?

@Morganh In this post you wrote that you were able to recognize the BJT340E.png plate above.

How did you do that? All I got was bjt and the other recognition did also not look very promising. I’m aware, that I still use a US trained model, but anyway…

We did not do experiments. We just confirm from the network architecture that the OCRNet_vit version can recognize the spaces. It is needed to train from scratch without existing model in Optical Character Recognition | NVIDIA NGC. Also, you can find that about 800K training images are used when OCRNet pretrained model are trained.
So, it is needed to add more training images for your cases.
Also, it is needed to check and improve training images/labels.
Trim the unexpected areas. Such as trim to below.
image , label is: TOP[Space]17355

What would be required to train OCRNet from scratch besides the images and labels?

To shoot and label all the images seem to be a cumbersome job, given the fact, that this could be a perfect task for generative AI… Are you aware of such solutions?

Wait. You can still use the existing ocrnet_vit pretrained model to finetune your dataset.
User can finetune their own dataset with TAO. Reference: tao_tutorials/notebooks/tao_launcher_starter_kit/ocrnet/ocrnet-vit.ipynb at main · NVIDIA/tao_tutorials · GitHub.
Images and labels are required. More dataset is better.

This notebook is what I was using all the time, but not with my own training data.

All that back and forth with gt_new.txt and gt.txt is pretty confusing: Given a German plate set, wouldn’t it be OK to have this final character_list (all line by line of course):

ABCDEFGHIJKLMNOPQRSTUVWÖÜ[minus][space]0123456789

Or do I have to do this gt_next dance too?

My plates to also not have ~*± and stuff, could a retrained model torpedo that?

LPR is really not that bad, the crux is, that it doesn’t detect spaces…:(

Please add the characters where can be found and you want to train.
For example, lowcases? abcd…

Yes, you can also add.

“~*±”

we don’t have that on plates. As well not lower case characters.

OK.
For training images, please trim the unexpected areas.

OK, understood. And you think I could use the pre-trained models? Would I still need 800+ images/labels then?

Yes, you can use the pretrained model. For training images, you can start with several hundreds or several thousands. Split 90% as training images and 10% as validation images. Add more training images if the model is needed to improve.

Given this plate:

image

What could I do to prevent accidental detection of decals of authorities and TÜV (here between B and A)?

image
If this kind of case always happens in your dataset, you can define it in groundtruth as [special]. So label is B[special]A[space]123E.

Great. Yes it is always like so, but can look different in colour and design.

image

What about 2 line plates? Motorbikes have plates like this one

Would I have to train the model with grayscale images of a certain size, say 200 x 64? Since this is the input at runtime (I guess).

Need to split to two lines and label it separately.