Tao toolkit observations

foreverneilyoung · May 27, 2024, 9:37am

This config is working now, even though it still produces an error:

ERROR: [TRT]: 3: Cannot find binding of given name: output_cov/Sigmoid

[property]
gpu-id=0
net-scale-factor=0.00392156862745098
offsets=0;0;0
infer-dims=3;384;1248
tlt-model-key=tlt_encode
network-type=0
network-mode=2
labelfile-path=models/primary-detector/resnet18-detector/labels.txt
onnx-file=models/primary-detector/resnet18-detector/resnet18_detector.onnx
#model-engine-file=models/primary-detector/resnet18-detector/resnet18_detector.onnx.b1_gpu0_int8.engine
int8-calib-file=models/primary-detector/resnet18-detector/calibration.bin
batch-size=1
num-detected-classes=3
model-color-format=0
maintain-aspect-ratio=0
output-tensor-meta=0
cluster-mode=2
gie-unique-id=1
uff-input-order=0
output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd
uff-input-blob-name=input_1


[class-attrs-all]
pre-cluster-threshold=0.2
eps=0.4
group-threshold=1

Any suggestions to the output_blob_names?

foreverneilyoung · May 27, 2024, 9:42am

And results are way worse than with resnet18_trafficcamnet… For what reasons ever…

foreverneilyoung · May 27, 2024, 9:43am

Even yolov7-tiny is better

foreverneilyoung · May 27, 2024, 9:48am

If you would like to judge and help me with improving the results, I could upload 3 videos, showing one and the same Berlin street scene created using three different models…

Morganh · May 27, 2024, 9:51am

Suggest you to generate a new topic since original issues are gone on your side.
In your new topic, please describe:

What models you have trained? Which dataset? spec file? Pruned or unpruned?
What is the new issue?

foreverneilyoung · May 27, 2024, 9:53am

Ok, here or in the DeepStream forum?

Morganh · May 27, 2024, 12:15pm

If you are running into error with deepsteam application, you can create a topic in deepstream forum. If you believe it is an issue of the model itself, you can create a topic here.

foreverneilyoung · May 27, 2024, 12:32pm

Having no issue with DS.

Here’s the post:

foreverneilyoung · May 27, 2024, 3:54pm

These links

https://registry.ngc.nvidia.com/orgs/nvidia/models/tao_lpdnet
https://registry.ngc.nvidia.com/orgs/nvidia/models/tao_lprnet

from the blog https://developer.nvidia.com/blog/creating-a-real-time-license-plate-detection-and-recognition-app/ are ending up in 404

foreverneilyoung · May 27, 2024, 5:29pm

Thanks for this info. Would you have ONE sample for this?

The dataset for LPRNet contains cropped license plates images and corresponding label files.

And also what “characters_list.txt” contains? Let’s be more specific with a sample.

Say my image is this:

It is 640x385, 72 pixel/inch.

Would be the “cropped license plate image” the part in the magenta box then?

What would label_0000.txt have to contain then? left, top, width and height in the cropped image? And the text as “BJT304E”? That would be rubbish.

Please note: Your current LPR is able to detect that as “BJT304E”, but this is WRONG in Germany. Since the number is “B JT 304 E”. Spaces have a BIG meaning over here :)

So what would I have to have in character_list.txt in order to be able to recognize this as “B JT 304 E” instead of just “BJT304E”?

I’m sorry, if my questions are stupid, but it is really not very easy to start from zero with all this…

Morganh · May 28, 2024, 1:54am

Sorry for inconvenient. Please use below links.
https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/lpdnet.
https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/lprnet.

Morganh · May 28, 2024, 2:08am

For current LPR, it does not support to recognize space. You can use another network(OCRNet, OCRNet - NVIDIA Docs) to retrain the German dataset.
Please generate groudtruth in training dataset. For example, “B JT 304 E” image → B[space]JT[space]304[space]E, then use Attention decode way to retrain the OCRNet model. There is a characters_list.txt file that contains all the characters found in the dataset. In the character.list, for example, A,B,C,D,E,...,a,b,c,d,...,[space].

foreverneilyoung · May 28, 2024, 4:59am

For current LPR, it does not support to recognize space.

Knew that already. Thanks for confirmation.

You can use another network(OCRNet, OCRNet - NVIDIA Docs ) to retrain the German dataset.

That’s interesting. In case of the sample above, what would for instance be “Dataset/images/0000.jpg”? Would it be the magenta framed part of the entire image? The number plate in different angles, resolutions, colours, light conditions?

Morganh · May 28, 2024, 5:08am

Yes.

Yes, they can be.

foreverneilyoung · May 28, 2024, 5:21am

Thanks again. Have a nice day

foreverneilyoung · May 29, 2024, 5:23am

I would like to know about the training requirements for this model.

Last night I was fighting like hell to run the very first steps of the ocrnet training to no avail. The attempt to pull nvcr.io/nvidia/tao/tao-toolkit 5.3.0-pyt always failed with obscure reasons. I finally figured out, that the reason was a memory shortage. So I removed all old containers (e.g. all which where loaded for the detectnet_v2 training) and it worked.

The nvcr.io/nvidia/tao/tao-toolkit 5.3.0-pyt occupied 25 GB.

Now, in the flow of the ocrnet training, I’m seemingly again in that trap, because I’m trying to export the model in stage 10.

With horrors I notice, that the notebook again tries to load yet another container, now nvcr.io/nvidia/tao/tao-toolkit 5.3.0-deploy. Supposingly again > 20 GB, supposingly again bringing my machine to the edge.

My current training device is an AWS g4dn.xlarge fitted with a T4, 16 GB RAM and 125 GB SSD. That seems to be insufficient for this model to follow a tutorial.

Quick question: Are you kidding me?

foreverneilyoung · May 29, 2024, 5:40am

Oh puhh… Just 10 GB now. That went through…

OK, finally the ocrnet/ocrnet-vit.ipynb showed up these results:

.
├── best_accuracy.onnx
├── status.json
└── trt.engine

Now I’m having a doubt, how to make use of it in my app, especially because my current lpr config looks like so and would for sure - just a guess - not work with the output of this training, wouldn’t it?


[property]
gpu-id=0
model-engine-file=models/LP/LPR/us_lprnet_baseline18_deployable.etlt_b16_gpu0_fp16.engine
labelfile-path=models/LP/LPR/labels_us.txt
tlt-encoded-model=models/LP/LPR/us_lprnet_baseline18_deployable.etlt
tlt-model-key=nvidia_tlt
batch-size=16
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=3
gie-unique-id=3
output-blob-names=tf_op_layer_ArgMax;tf_op_layer_Max
#0=Detection 1=Classifier 2=Segmentation
network-type=1
parse-classifier-func-name=NvDsInferParseCustomNVPlate
custom-lib-path=nvinfer/libnvdsinfer_custom_impl_lpr.so
process-mode=2
operate-on-gie-id=2
net-scale-factor=0.00392156862745098
#net-scale-factor=1.0
#0=RGB 1=BGR 2=GRAY
model-color-format=0

What sample am I supposed to study now, or what 30 GB container needs to be pulled to make that something useable?

foreverneilyoung · May 29, 2024, 6:19am

Going to test the lprnet/lprnet.ipynb notebook now. I’m having my dataset and the LMDB metadata for it, I hope it will not be such a big deal as all the TAO experiements have been up to now.

Morganh · May 29, 2024, 6:50am

For running OCDNet and OCRNet with deepstream, please take a look at deepstream_tao_apps/apps/tao_others/deepstream-nvocdr-app at b300bd1c9c6134b178f4ed67ab1e365422c15e4f · NVIDIA-AI-IOT/deepstream_tao_apps · GitHub.
Since currently you are using LPDNet + OCRNet, you can reach help in deepstream forum for this case.

foreverneilyoung · May 29, 2024, 7:12am

I have finished re-training of lprnet now, the first and only positive experience with TAO so far.

What I got was this:

Exported engine:
------------
total 158M
-rw-r--r-- 1 ubuntu ubuntu 29M May 29 07:04 lprnet_epoch-024.fp16.engine
-rw-r--r-- 1 ubuntu ubuntu 74M May 29 07:01 lprnet_epoch-024.fp32.engine
-rw-r--r-- 1 ubuntu ubuntu 56M May 29 06:57 lprnet_epoch-024.onnx
-rw-r--r-- 1 ubuntu ubuntu 524 May 29 07:04 status.json

Could you please elaborate, how this does map to my current model directory, which contains this?

.
├── labels_us.txt
├── us_lprnet_baseline18_deployable.etlt
└── us_lprnet_baseline18_deployable.etlt_b16_gpu0_fp16.engine

I’m missing some *.etlt file in the training results, also because this is required by configuration:

[property]
gpu-id=0
model-engine-file=models/LP/LPR/us_lprnet_baseline18_deployable.etlt_b16_gpu0_fp16.engine
labelfile-path=models/LP/LPR/labels_us.txt
tlt-encoded-model=models/LP/LPR/us_lprnet_baseline18_deployable.etlt
tlt-model-key=nvidia_tlt
batch-size=16
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
num-detected-classes=3
gie-unique-id=3
output-blob-names=tf_op_layer_ArgMax;tf_op_layer_Max
#0=Detection 1=Classifier 2=Segmentation
network-type=1
parse-classifier-func-name=NvDsInferParseCustomNVPlate
custom-lib-path=nvinfer/libnvdsinfer_custom_impl_lpr.so
process-mode=2
operate-on-gie-id=2
net-scale-factor=0.00392156862745098
#net-scale-factor=1.0
#0=RGB 1=BGR 2=GRAY
model-color-format=0

[class-attrs-all]
threshold=0.5

Topic		Replies	Views
Tao Training Model Error TAO Toolkit	7	495	January 15, 2024
Tao toolkit detectnet training kitty format error TAO Toolkit	10	417	December 8, 2023
TAO 5.0 failed to train TAO Toolkit	8	546	August 1, 2023
Detectnet2 TAO Toolkit model training fail on formating dataset on kitti format TAO Toolkit	69	966	January 22, 2024
Detectnet_v2 notebook stuck at tfrecords conversion step TAO Toolkit	17	51	October 30, 2024
Excute tao model detectnet_v2 train but Failed TAO Toolkit tao	5	216	June 4, 2024
Detectnet_v2.ipynb issue with custom data TAO Toolkit tao	3	276	May 17, 2024
Detectnetv2 tfrecords error TAO Toolkit	4	423	January 13, 2024
Tao-converter [ERROR] Failed to parse the model, please check the encoding key to make sure its correct TAO Toolkit deepstream	70	1705	July 10, 2023
Problem of tao detectnet_v2 evaluate 0% TAO Toolkit python	21	394	July 7, 2023

Tao toolkit observations

Related topics