How to parse CRNN model / OCR Text Recognition

rmpt · January 29, 2021, 1:54am

• Hardware Platform: amd64
• DeepStream Version: 5.0.0
• TensorRT Version: 7.2.2-1+cuda10.2
• NVIDIA GPU Driver Version (valid for GPU only): 450.51

Hello,

I’m new to DeepStream and I’m trying to use one of the ONNX pre-trained models shared in the OpenCV tutorial TextDetectionModel and TextRecognitionModel.

ONNX Models folder: trained_model_for_text_recognition - Google Drive

I’m able to run the deepstream app using the ONNX model as classifier, it is converted to a TensorRT engine with the following layers:

0 INPUT kFLOAT input 1x32x100
1 OUTPUT kHALF output 1x37

But according to OpenCV documentation:

the output of the text recognition model should be a probability matrix. The shape should be (T, B, Dim) , where

T is the sequence length

B is the batch size (only support B=1 in inference)

and Dim is the length of vocabulary +1(Blank of CTC is at the index=0 of Dim).

Netron visualization of the ONNX network output is float32(26,1,27) : https://imgur.com/2mWcHR9

Apparently DeepStream gets a different output shape, missing the T sequence length.
I’m doing something wrong or the conversion to TensorRT did not performed well?

Tried also to add a custom function to parse the output std::vector<NvDsInferLayerInfo> const &outputLayersInfo but running the pipeline I get this in every frame:

numAttributes = 1
numClasses = 1
layerHeight = 37
layerWidth = 0

Any help to solve this issue or some example to parse the output of similar networks is much appreciated.

Thank you

AastaLLL · January 29, 2021, 4:13am

Hi,

Please run your model with trtexec and check the output dimension first.
For example, with mnist.onnx

$ /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx --dumpOutput

Output:

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx --dumpOutput
[01/29/2021-12:08:17] [I] === Model Options ===
[01/29/2021-12:08:17] [I] Format: ONNX
[01/29/2021-12:08:17] [I] Model: /usr/src/tensorrt/data/mnist/mnist.onnx
[01/29/2021-12:08:17] [I] Output:
...
[01/29/2021-12:08:43] [I] Output Tensors:
[01/29/2021-12:08:43] [I] Plus214_Output_0: (1x10)
[01/29/2021-12:08:43] [I] -1.60912 -0.901603 1.55434 1.57656 -0.0528269 -0.897766 0.831163 0.671341 -0.281258 -0.289106
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=/usr/src/tensorrt/data/mnist/mnist.onnx --dumpOutput

The output dimension is (1x10).

Thanks.

rmpt · January 29, 2021, 10:46am

Hi @AastaLLL

For both ONNX models I gave a try, the output dimensions on trtexec seems to be correct:

[01/29/2021-10:26:33] [I] Output Tensors:
[01/29/2021-10:26:33] [I] output: (26x1x37)
[01/29/2021-10:26:33] [I] -5.94975 -12.1903 -9.81156 -10.347 -11.3837 -11.4158 -12.1832 -12.044 -12.0649 (…)
(…)
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=ResNet_CTC.onnx --dumpOutput

[01/29/2021-10:28:13] [I] Output Tensors:
[01/29/2021-10:28:13] [I] output: (24x1x37)
[01/29/2021-10:28:13] [I] -2.05273 -8.35156 -6.10156 -6.12109 -6.78906 -7.73828 -8.1875 -8.17188 -8.11719 (…)
(…)
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=CRNN_VGG_BiLSTM_CTC_float16.onnx --dumpOutput

Any idea why DeepStream just shows output 1x37 in both models?

Thank you

PhongNT · January 30, 2021, 4:35am

maybe this post can help you How to get result_label from custom classification parser

PhongNT · January 31, 2021, 6:12am

output shap of you model must be 1x24x37

rmpt · February 4, 2021, 11:39am

Hi @PhongNT,

Thanks a lot for your help.
You mean I need to apply permute/transpose to axis of the original model in order to make it work with deepstream?

I can’t figure why the output shape is correct in TensorRT but is not in DeepStream.

@AastaLLL any thoughs on this?

PhongNT · February 4, 2021, 12:07pm

yes, in my experience

AastaLLL · February 9, 2021, 5:30am

Hi,

You will need a customized parser as suggested by the phongnguyen0812.
The workflow of Deepstream looks like this:

Input → Preprocessing (ex. format) → TensorRT → Output parsing (ex. Tensor to bbox)

So based on the experiment above, the tensor output from TensorRT is correct.
But some issues when parsing the tensor into a final Deepstream output.

Please noted that Deepstream doesn’t have a parser that supports text format.
You will need to implement it on your own:
https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_using_custom_model.html#custom-output-parsing

Thanks.

Topic		Replies	Views
Classifier result on onnx doesn't match Deepstream result DeepStream SDK tensorrt , tensorflow , nvbugs , onnx	34	3856	July 28, 2021
any recognition model on deepstream (such as text recognition)? DeepStream SDK	3	1197	November 25, 2019
Object detection pre-trained model inference issue in deepstream DeepStream SDK tensorrt , jetson-inference , gstreamer , python	50	1248	August 9, 2024
Output of deepstream doesn't match with output of onnx model DeepStream SDK tensorrt , onnx	3	709	September 15, 2020
Custom detection ONNX model gives wrong outputs using nvinfer with DeepStream 5.1 DeepStream SDK	16	3131	September 27, 2021
Using Custom action recognition Model in Deepstream 3D action recognition DeepStream SDK tensorrt , gstreamer , deepstream	38	3179	June 6, 2022
TensorFlow EfficientDet-D0 -> ONNX -> TensorRT converted model fails to run in Deepstream DeepStream SDK deepstream61	7	1170	August 11, 2022
Create custum model yolov3 / onnx DeepStream SDK	9	669	January 11, 2021
[PaddleOCR][Deepstream] No detections DeepStream SDK tensorrt	2	944	February 26, 2024
Loss data when build engine from onnx DeepStream SDK nvbugs	8	622	November 6, 2020

How to parse CRNN model / OCR Text Recognition

Related topics