OCR Model as Pgie

Hardware Platform: GPU
Deepstream: 7.1
Docker Image: 7.1-triton-multiarch
GPU Type: A4000

I have created custom object detection pipeline using deepstream-test5. Now I want to add OCR capability to this pipeline. Needed nvmultiurisrcbin and broker features for OCR model. For these reason looks like I need to use OCR model as a Pgie. I have streams with fixed text areas. Is there any example?

Can you tell us what is your model’s input and what is the output?

For example, the TAO model BodyPoseNet | NVIDIA NGC takes the image with single or multiple persons as the input and can output the body key points for the persons in the image. We can generate body bboxes with the key points, so we take this model as PGIE.

Taking HLS stream as an input. My stream has multiple regions with texts. Text locations are fixed. Want to detect all texts and sent to rabbitmq as a single payload (Frame by frame). I also need rest support to add/remove stream to this OCR pipeline.

Licence Plate Recognition example uses OCR model as a Sgie but it doesn’t meet with my case.

This is my object detection pipeline example. I want to create OCR pipeline with same features. Documents say OCR model uses nvdsvideotemplate but I’m using nvinferserver. Will it work if I change config-file with an OCR model config file?

[application]
enable-perf-measurement=1
perf-measurement-interval-sec=5
#gie-kitti-output-dir=streamscl

[tiled-display]
enable=0
rows=2
columns=5
width=1280
height=720
gpu-id=0
nvbuf-memory-type=0
#Set to 1 to automatically tile in Square Grid
square-seq-grid=1

[source-list]
use-nvmultiurisrcbin=1
#To display stream name in FPS log, set stream-name-display=1
stream-name-display=1
max-batch-size=10 # Maximum number of streams can added to pipeline
http-ip=localhost
http-port=9000
#Set low latency mode for bitstreams having I and IPPP frames on decoder
low-latency-mode=1
#sgie batch size is number of sources * fair fraction of number of objects detected per frame per source
#the fair fraction of number of object detected is assumed to be 4
sgie-batch-size=40
#Set the below key to keep the application running at all times

[source-attr-all]
enable=1
type=3 #1: Camera (V4L2) 2: URI 3: MultiURI 4: RTSP 5: Camera (CSI) (Jetson only)
gpu-id=0
cudadec-memtype=0
#drop-frame-interval=5
#latency=100
#rtsp-reconnect-interval-sec=10
#Limit the rtsp reconnection attempts
#rtsp-reconnect-attempts=4

[streammux]
gpu-id=0
live-source=1
batch-size=6
batched-push-timeout=40000
width=1920
height=1080
enable-padding=1 # Maintains aspect ratio
nvbuf-memory-type=0
drop-pipeline-eos=0

[sink0]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File 4=UDPSink 5=nvdrmvideosink 6=MsgConvBroker
type=6
msg-conv-payload-type=1
msg-conv-msg2p-new-api=0
msg-broker-proto-lib=/opt/nvidia/deepstream/deepstream/lib/libnvds_amqp_proto.so
#Provide your msg-broker-conn-str here
msg-broker-conn-str=rabbit.app_network;5672;guest;guest
topic=deepstream1
msg-broker-comp-id=1
msg-conv-comp-id=1
#Optional:
msg-broker-config=/opt/nvidia/deepstream/deepstream/sources/libs/amqp_protocol_adaptor/cfg_amqp.txt
msg-conv-msg2p-lib=/opt/nvidia/deepstream/deepstream/lib/libnvds_msgconv.so

[sink1]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File 4=UDPSink 5=nvdrmvideosink 6=MsgConvBroker
type=6
msg-conv-payload-type=1
msg-broker-proto-lib=/opt/nvidia/deepstream/deepstream/lib/libnvds_amqp_proto.so
#Provide your msg-broker-conn-str here
msg-broker-conn-str=rabbit.app_network;5672;guest;guest
topic=deepstream2
msg-broker-comp-id=2
msg-conv-comp-id=2
#Optional:
msg-broker-config=/opt/nvidia/deepstream/deepstream/sources/libs/amqp_protocol_adaptor/cfg_amqp.txt
msg-conv-msg2p-lib=/opt/nvidia/deepstream/deepstream/lib/libnvds_msgconv.so

[sink2]
enable=1
#Type - 1=FakeSink 2=EglSink 3=File 4=UDPSink 5=nvdrmvideosink 6=MsgConvBroker
type=6
msg-conv-payload-type=1
msg-broker-proto-lib=/opt/nvidia/deepstream/deepstream/lib/libnvds_amqp_proto.so
#Provide your msg-broker-conn-str here
msg-broker-conn-str=rabbit.app_network;5672;guest;guest
topic=deepstream3
msg-broker-comp-id=3
msg-conv-comp-id=3
#Optional:
msg-broker-config=/opt/nvidia/deepstream/deepstream/sources/libs/amqp_protocol_adaptor/cfg_amqp.txt
msg-conv-msg2p-lib=/opt/nvidia/deepstream/deepstream/lib/libnvds_msgconv.so

[primary-gie]
enable=1
#interval=5
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=/opt/nvidia/deepstream/deepstream/DeepStream-Yolo/config_infer_vehicle.txt
#infer-raw-output-dir=../../../../../samples/primary_detector_raw_output/

Are you taking about the usage of Optical Character Recognition | NVIDIA NGC model provided by TAO toolkit ?

Are there multiple text areas in a single frame?

Yes

Yes. OCRNet and OCDNet

We have provided sample of TAO OCD+OCR models. deepstream_tao_apps/apps/tao_others/deepstream-nvocdr-app at master · NVIDIA-AI-IOT/deepstream_tao_apps

The OCD+OCR models can’t be used in deepstream-app configuration directly.

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.