• Hardware Platform: amd64
• DeepStream Version: 5.0.0
• TensorRT Version: 7.2.2-1+cuda10.2
• NVIDIA GPU Driver Version (valid for GPU only): 450.51
I’m new to DeepStream and I’m trying to use one of the ONNX pre-trained models shared in the OpenCV tutorial TextDetectionModel and TextRecognitionModel.
ONNX Models folder: trained_model_for_text_recognition - Google Drive
I’m able to run the deepstream app using the ONNX model as classifier, it is converted to a TensorRT engine with the following layers:
0 INPUT kFLOAT input 1x32x100
1 OUTPUT kHALF output 1x37
But according to OpenCV documentation:
the output of the text recognition model should be a probability matrix. The shape should be (T, B, Dim) , where
- T is the sequence length
- B is the batch size (only support B=1 in inference)
- and Dim is the length of vocabulary +1(Blank of CTC is at the index=0 of Dim).
Netron visualization of the ONNX network output is float32(26,1,27) : Imgur: The magic of the Internet
Apparently DeepStream gets a different output shape, missing the T sequence length.
I’m doing something wrong or the conversion to TensorRT did not performed well?
Tried also to add a custom function to parse the output
std::vector<NvDsInferLayerInfo> const &outputLayersInfo but running the pipeline I get this in every frame:
numAttributes = 1
numClasses = 1
layerHeight = 37
layerWidth = 0
Any help to solve this issue or some example to parse the output of similar networks is much appreciated.