Inference time and Inference on multiple images with nvOCDR

• Hardware Platform (Jetson / GPU)
NVIDIA Jetson Orin NX (16GB ram)
• DeepStream Version
Deepstream 7.0 (in a docker)
• JetPack Version (valid for Jetson only)
Jetpack 6.0 (L4T 36.3.0)
• TensorRT Version
8.6.2.3
• Issue Type( questions, new requirements, bugs)
Question

I am using the nvOCDR sample with the gst-launch-1.0 command. I can do inference on a jpg or a mp4 one by one. The command to do inference on 2 images at once or with a batch size larger than 1 does not work for me.
Is there a way to create the pipeline, do inference on a number of images and then only after destroy the pipeline?
Maybe there is a way to modify the pipeline to do inference on a number of images or on a batch of images?

Also, is there a way to display the inference time per image with the nvocdr sample? Thank you in advance.

Please refer to deepstream_tao_apps/apps/tao_others/deepstream-nvocdr-app at master · NVIDIA-AI-IOT/deepstream_tao_apps (github.com)

1 Like

Hello @Fiona.Chen
Thank you for your quick answer. I had tried the mentioned nvocdr sample following all the steps. I had a problem with the ViT versions of the models on my Orin NX and tried to resolve them ( Process Killed when Generating a TensorRT Engine for the ViT models - #8 by junshengy and Process killed when generating a TensorRT Engine for a ViT model ).

Meanwhile, I tried to use the sample with the v1.0 versions of the OCD and OCR model from Optical Character Detection | NVIDIA NGC and Optical Character Recognition | NVIDIA NGC .

I then generated the engines with their respective commands ( inspired from GitHub - NVIDIA-AI-IOT/NVIDIA-Optical-Character-Detection-and-Recognition-Solution: This repository provides optical character detection and recognition solution optimized on Nvidia devices. )

Here is how I modified the nvocdr_app_config.yml file:

source-list:
   #list: file:///workspace/nvocdr/img_0.jpg;file:///workspace/nvocdr/img_1.jpg
   list: file:///~/input/test2-crop.mp4

output:
  ## 1:file ouput  2:fake output 3:eglsink output
  type: 1
  ## 0: H264 encoder  1:H265 encoder
  codec: 0
  #encoder type 0=Hardware 1=Software
  enc-type: 0
  bitrate: 2000000
  ##The file name without suffix
  filename: test

streammux:
  width: 1280
  height: 720
  batched-push-timeout: 40000

video-template:
  customlib-name: nvocdr_libs/aarch64/libnvocdr_impl.so
  customlib-props:
    - ocdnet-engine-path:../../../../../models/nvocdr/ocdnet.fp16.engine
    - ocdnet-input-shape:3,736,1280
    - ocdnet-binarize-threshold:0.1
    - ocdnet-polygon-threshold:0.3
    - ocdnet-max-candidate:200
    - ocrnet-engine-path:../../../../../models/nvocdr/ocrnet.fp16.engine
    - ocrnet-dict-path:../../../../../models/nvocdr/character_list
    - ocrnet-input-shape:1,32,100
   # - ocrnet-decode:Attention

And by doing so, when I run the sample with debug level 3, I get the following errors:

root@************:/~/input/deepstream-app/deepstream_tao_apps/apps/tao_others/deepstream-nvocdr-app# ./deepstream-nvocdr-app nvocdr_app_config.yml
in create_video_encoder, isH264:1, enc_type:0
/bin/bash: line 1: lsmod: command not found
/bin/bash: line 1: modprobe: command not found
!! [WARNING] Unknown param found : type
!! [WARNING] Unknown param found : codec
!! [WARNING] Unknown param found : enc-type
!! [WARNING] Unknown param found : filename
 Now playing! 
Opening in BLOCKING MODE 
0:00:00.182777563 70800 0xaaaaef409980 WARN                    v4l2 gstv4l2object.c:4671:gst_v4l2_object_probe_caps:<nvvideo-h264enc:src> Failed to probe pixel aspect ratio with VIDIOC_CROPCAP: Unknown error -1
0:00:00.228694223 70800 0xffff64000d80 WARN              aggregator gstaggregator.c:2099:gst_aggregator_query_latency_unlocked:<mp4-mux> Latency query failed
Inside Custom Lib : Setting Prop Key=ocdnet-engine-path Value=../../../../../models/nvocdr/ocdnet.fp16.engine
Inside Custom Lib : Setting Prop Key=ocdnet-input-shape Value=3,736,1280
Inside Custom Lib : Setting Prop Key=ocdnet-binarize-threshold Value=0.1
Inside Custom Lib : Setting Prop Key=ocdnet-polygon-threshold Value=0.3
Inside Custom Lib : Setting Prop Key=ocdnet-max-candidate Value=200
Inside Custom Lib : Setting Prop Key=ocrnet-engine-path Value=../../../../../models/nvocdr/ocrnet.fp16.engine
Inside Custom Lib : Setting Prop Key=ocrnet-dict-path Value=../../../../../models/nvocdr/character_list
Inside Custom Lib : Setting Prop Key=ocrnet-input-shape Value=1,32,100
Inside Custom Lib : Setting Prop Key=ocrnet-decode Value=Attention
0:00:00.263201560 70800 0xaaaaef409980 WARN                 basesrc gstbasesrc.c:3688:gst_base_src_start_complete:<source> pad not activated yet
Decodebin child added: source
Decodebin child added: decodebin0
0:00:00.263678467 70800 0xaaaaef409980 WARN                 basesrc gstbasesrc.c:3688:gst_base_src_start_complete:<source> pad not activated yet
Running...
Decodebin child added: qtdemux0
0:00:00.268566134 70800 0xffff64001440 WARN                 qtdemux qtdemux_types.c:249:qtdemux_type_get: unknown QuickTime node type sgpd
0:00:00.268586486 70800 0xffff64001440 WARN                 qtdemux qtdemux_types.c:249:qtdemux_type_get: unknown QuickTime node type sbgp
0:00:00.268600342 70800 0xffff64001440 WARN                 qtdemux qtdemux_types.c:249:qtdemux_type_get: unknown QuickTime node type ldes
0:00:00.268637079 70800 0xffff64001440 WARN                 qtdemux qtdemux.c:3121:qtdemux_parse_trex:<qtdemux0> failed to find fragment defaults for stream 1
0:00:00.268760602 70800 0xffff64001440 WARN                 qtdemux qtdemux.c:3121:qtdemux_parse_trex:<qtdemux0> failed to find fragment defaults for stream 2
Decodebin child added: multiqueue0
Decodebin child added: h264parse0
Decodebin child added: capsfilter0
0:00:00.270216988 70800 0xffff64001440 WARN            uridecodebin gsturidecodebin.c:960:unknown_type_cb:<uri-decode-bin> warning: No decoder available for type 'audio/mpeg, mpegversion=(int)4, framed=(boolean)true, stream-format=(string)raw, level=(string)2, base-profile=(string)lc, profile=(string)lc, codec_data=(buffer)119056e500, rate=(int)48000, channels=(int)2'.
Decodebin child added: nvv4l2decoder0
Opening in BLOCKING MODE 
0:00:00.296631207 70800 0xffff640017c0 WARN                    v4l2 gstv4l2object.c:4671:gst_v4l2_object_probe_caps:<nvv4l2decoder0:src> Failed to probe pixel aspect ratio with VIDIOC_CROPCAP: Unknown error -1
NvMMLiteOpen : Block : BlockType = 261 
NvMMLiteBlockCreate : Block : BlockType = 261 
0:00:00.399059400 70800 0xffff640017c0 WARN                    v4l2 gstv4l2object.c:4671:gst_v4l2_object_probe_caps:<nvv4l2decoder0:src> Failed to probe pixel aspect ratio with VIDIOC_CROPCAP: Unknown error -1
In cb_newpad
###Decodebin pick nvidia decoder plugin.
Error reading serialized TensorRT engine: ../../../../../models/nvocdr/ocdnet.fp16.engine
terminate called after throwing an instance of 'std::length_error'
  what():  cannot create std::vector larger than max_size()
Aborted (core dumped)

Do you think the sample works with v1.0 versions of the models? If so, is there anything I need to modify further to make it work? Otherwise, is there an alternative to infer on multiple sources with the v1.0 models?

Thank you in advance for you help!

The v1.0 models work with JetPack 6.0 DP + DeepStream 6.4 well.

Please refer to NVIDIA-AI-IOT/deepstream_tao_apps at release/tao5.1_ds6.4ga (github.com)

Hi,

Thank you for the suggestion. I tried using DS6.4 but with my Jetpack version, it did not work.

But I managed to generate the engines for the ViT models on my Orin NX. I tried using it and I had some good results.
Now to print the performance of the inference, I suppose I have to make custom modifications to the code?

And I have another question. The results I get are far worse than the one from the API, especially if I am using very high resolution images. Are these the same models used? Is there a way to improve my results?
The parameter “is-high-resolution” doesn’t seem to help me a lot.

Thank you in advance for you help!

Do you only need the model inferencing time(the preprocessing and postprocessing are not included)?

The model in ocdrnet | NVIDIA NIM is the latest version. There are some parameters for the usage, you may refer to NVIDIA-Optical-Character-Detection-and-Recognition-Solution/doc/nvOCDR.md at main · NVIDIA-AI-IOT/NVIDIA-Optical-Character-Detection-and-Recognition-Solution (github.com). Or you can consult in Latest Intelligent Video Analytics/TAO Toolkit topics - NVIDIA Developer Forums forum for more details.

1 Like

Hello,

I would like the preprocessing and postprocessing as well, to evaluate the time needed to infer on 1 frame in real world scenarios. But I have a problem right now:

When I do inference on 1 .jgp or 1 .mp4, it works fine, but if I put several files, the pipeline stops in the middle of inference. And all I can do then is CTRL+\.

Here is how I modify the config file to put several files:

source-list:
   #list: file:///workspace/nvocdr/img_0.jpg;file:///workspace/nvocdr/img_1.jpg
   #list: file:///~/input/test2-crop.mp4
   list: file:///~/input/test_set/image_2.jpg;file:///~/input/test_set/image_6.jpg

output:
  ## 1:file ouput  2:fake output 3:eglsink output
  type: 1
  ## 0: H264 encoder  1:H265 encoder
  codec: 0
  #encoder type 0=Hardware 1=Software
  enc-type: 0
  bitrate: 2000000
  ##The file name without suffix
  filename: test

streammux:
  width: 4032
  height: 3024
  batched-push-timeout: 40000

video-template:
  customlib-name: nvocdr_libs/aarch64/libnvocdr_impl.so
  customlib-props:
    - ocdnet-engine-path:../../../../models/nvocdr/ocdnet.vit.fp16.engine
    - ocdnet-input-shape:3,736,1280
    - ocdnet-binarize-threshold:0.1
    - ocdnet-polygon-threshold:0.3
    - ocdnet-max-candidate:200
    - ocrnet-engine-path:../../../../models/nvocdr/ocrnet.vit.fp16.engine
    - ocrnet-dict-path:../../../../models/nvocdr/character_list
    - ocrnet-input-shape:1,64,200
    - ocrnet-decode:Attention
    #- is-high-resolution:1

Even if I put output type 2 (fake output), the pipeline stops in the middle of inference. Can you please explain me the correct way to specify inference on multiple files at once and have multiple output files? Thank you in advance! :)

If I am able to do this, I can then use the total time it took me to infer on all the files.

  1. Sorry, I just realised there is another problem. With the config mentioned above, I get the following error when using 2 images of size 4032x3024:
nvstreammux: Successfully handled EOS for source_id=0
nvstreammux: Successfully handled EOS for source_id=1
CUDA runtime error invalid argument at line 763 in file nvocdrlib_impl.cpp
  1. And the inference stops advancing when I use a jpg with small sizes (1280x720 or 900x598; I adapt the streammux to the correct sizes) even if it is the only image I put in the source-list.
    The last lines of the log:
0:00:01.118278167 73453 0xffff28000b70 DEBUG           videodecoder gstvideodecoder.c:3867:gst_video_decoder_have_frame:<nvjpegdec0> Marking as sync point
0:00:01.118286903 73453 0xffff28000b70 DEBUG                default gstsegment.c:744:gst_segment_to_running_time_full: invalid position (-1)
0:00:01.118295895 73453 0xffff28000b70 LOG             videodecoder gstvideodecoder.c:3985:gst_video_decoder_decode_frame:<nvjpegdec0> frame 0xffff2005dc00 PTS 99:99:99.999999999, DTS 0:00:00.000000000, dist 0
0:00:01.118307256 73453 0xffff28000b70 LOG               GST_BUFFER gstbuffer.c:1837:gst_buffer_map_range: buffer 0xffff2007a290, idx 0, length -1, flags 0001
0:00:01.118313976 73453 0xffff28000b70 LOG               GST_BUFFER gstbuffer.c:305:_get_merged_memory: buffer 0xffff2007a290, idx 0, length 1
0:00:01.118324056 73453 0xffff28000b70 LOG                  jpegdec gstjpegdec.c:281:gst_jpeg_dec_init_source:<nvjpegdec0> init_source
0:00:01.118548701 73453 0xffff28000b70 DEBUG                jpegdec gstjpegdec.c:290:gst_jpeg_dec_skip_input_data:<nvjpegdec0> skip 17 bytes
NvMMLiteOpen : Block : BlockType = 277 
NvMMLiteBlockCreate : Block : BlockType = 277 
0:00:01.119519764 73453 0xffff28000b70 LOG                  jpegdec gstjpegdec.c:1238:gst_jpeg_dec_handle_frame:<nvjpegdec0> num_components=3
0:00:01.119536245 73453 0xffff28000b70 LOG                  jpegdec gstjpegdec.c:1239:gst_jpeg_dec_handle_frame:<nvjpegdec0> jpeg_color_space=3
0:00:01.119544853 73453 0xffff28000b70 LOG                  jpegdec gstjpegdec.c:1247:gst_jpeg_dec_handle_frame:<nvjpegdec0> r_h = 1, r_v = 1
0:00:01.119553525 73453 0xffff28000b70 LOG                  jpegdec gstjpegdec.c:1263:gst_jpeg_dec_handle_frame:<nvjpegdec0> [0] h_samp_factor=1, v_samp_factor=1, cid=1
0:00:01.119560245 73453 0xffff28000b70 LOG                  jpegdec gstjpegdec.c:1263:gst_jpeg_dec_handle_frame:<nvjpegdec0> [1] h_samp_factor=1, v_samp_factor=1, cid=2
0:00:01.119567893 73453 0xffff28000b70 LOG                  jpegdec gstjpegdec.c:1263:gst_jpeg_dec_handle_frame:<nvjpegdec0> [2] h_samp_factor=1, v_samp_factor=1, cid=3
0:00:01.119573781 73453 0xffff28000b70 LOG                  jpegdec gstjpegdec.c:1285:gst_jpeg_dec_handle_frame:<nvjpegdec0> starting decompress


And it works fine if I use the gst-lauch command:

gst-launch-1.0 filesrc location=/~/input/wagon1.jpeg ! jpegparse ! jpegdec ! nvvideoconvert ! “video/x-raw(memory:NVMM), format=NV12” !
m.sink_0 nvstreammux name=m batch-size=1 width=900 height=598 !
nvdsvideotemplate customlib-name=/opt/nvidia/deepstream/deepstream/NVIDIA-Optical-Character-Detection-and-Recognition-Solution/deepstream/libnvocdr_impl.so
customlib-props=“ocdnet-engine-path:/~/input/deepstream-app/models/nvocdr/ocdnet.vit.fp16.engine”
customlib-props=“ocdnet-input-shape:1,736,1280”
customlib-props=“ocdnet-binarize-threshold:0.1”
customlib-props=“ocdnet-polygon-threshold:0.3”
customlib-props=“ocdnet-max-candidate:200”
customlib-props=“is-high-resolution:0”
customlib-props=“ocrnet-engine-path:/~/input/deepstream-app/models/nvocdr/ocrnet.vit.fp16.engine”
customlib-props=“ocrnet-dict-path:/~/input/deepstream-app/models/nvocdr/character_list”
customlib-props=“ocrnet-decode:Attention”
customlib-props=“ocrnet-input-shape:1,64,200” !
nvmultistreamtiler rows=1 columns=1 width=900 height=598 ! nvvideoconvert ! nvdsosd !
nvvideoconvert ! ‘video/x-raw,format=I420’ ! jpegenc ! jpegparse ! filesink location=/~/input/wagon1_output.jpeg

I am also interested in performance metrics for ocdrnet. On a 1080Ti, for a similar pipeline and a 736*1280 resolution I am getting around 25fps. Can inference metrics be posted for other GPUs like A6000, A100 or 4090?

@Fiona.Chen

Hello again, when putting several images in the source-list of the config file, I have the following error now:

0:00:02.485399843  1791 0xffff4c0015c0 INFO           basetransform gstbasetransform.c:1325:gst_base_transform_setcaps:<source_capset> reuse caps
0:00:02.485831788  1791 0xffff4c0015c0 INFO               GST_EVENT gstevent.c:892:gst_event_new_caps: creating caps event video/x-raw(memory:NVMM), width=(int)4032, height=(int)3024, interlace-mode=(string)progressive, multiview-mode=(string)mono, multiview-flags=(GstVideoMultiviewFlagsSet)0:ffffffff:/right-view-first/left-flipped/left-flopped/right-flipped/right-flopped/half-aspect/mixed-mono, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)0/1, format=(string)NV12, block-linear=(boolean)false, num-surfaces-per-frame=(int)1, nvbuf-memory-type=(string)nvbuf-mem-surface-array, gpu-id=(int)0, colorimetry=(string)1:4:0:0
0:00:02.487452654  1791 0xffff4c0015c0 INFO             nvstreammux gstnvstreammux.cpp:1533:gst_nvstreammux_sink_event:<stream-muxer> mux got segment from src 3 time segment start=0:00:00.000000000, offset=0:00:00.000000000, stop=99:99:99.999999999, rate=1.000000, applied_rate=1.000000, flags=0x00, time=0:00:00.000000000, base=0:00:00.000000000, position 0:00:00.000000000, duration 99:99:99.999999999
0:00:02.489389464  1791 0xffff4c0015c0 INFO            videodecoder gstvideodecoder.c:3720:gst_video_decoder_clip_and_push_buf:<nvjpegdec0> First buffer since flush took 0:00:02.296170261 to produce
0:00:02.533934000  1791 0xffff4c000b70 INFO          GST_SCHEDULING gstpad.c:5030:gst_pad_get_range_unchecked:<source:src> getrange failed, flow: eos
0:00:02.533983121  1791 0xffff4c000b70 INFO          GST_SCHEDULING gstpad.c:5245:gst_pad_pull_range:<decodebin0:sink> pullrange failed, flow: eos
0:00:02.534012370  1791 0xffff4c000b70 INFO          GST_SCHEDULING gstpad.c:5030:gst_pad_get_range_unchecked:<sink:proxypad4> getrange failed, flow: eos
0:00:02.534022130  1791 0xffff4c000b70 INFO          GST_SCHEDULING gstpad.c:5245:gst_pad_pull_range:<typefind:sink> pullrange failed, flow: eos
nvstreammux: Successfully handled EOS for source_id=0
0:00:02.534224406  1791 0xffff4c000b70 INFO                    task gsttask.c:368:gst_task_func:<typefind:sink> Task going to paused
0:00:02.534263159  1791 0xffff4c001930 INFO         nvstreammux_ntp gstnvstreammux_ntp.cpp:352:gst_nvds_ntp_calculator_reset:<stream-muxer> Reset NTP calculations for source 0
0:00:02.538536690  1791 0xffff4c001930 INFO               GST_EVENT gstevent.c:1689:gst_event_new_sink_message: creating sink-message event
0:00:02.538551379  1791 0xffff4c001250 INFO          GST_SCHEDULING gstpad.c:5030:gst_pad_get_range_unchecked:<source:src> getrange failed, flow: eos
0:00:02.538634292  1791 0xffff4c001250 INFO          GST_SCHEDULING gstpad.c:5245:gst_pad_pull_range:<decodebin2:sink> pullrange failed, flow: eos
0:00:02.538655573  1791 0xffff4c001250 INFO          GST_SCHEDULING gstpad.c:5030:gst_pad_get_range_unchecked:<sink:proxypad6> getrange failed, flow: eos
0:00:02.538671893  1791 0xffff4c001250 INFO          GST_SCHEDULING gstpad.c:5245:gst_pad_pull_range:<typefind:sink> pullrange failed, flow: eos
nvstreammux: Successfully handled EOS for source_id=2
0:00:02.538817272  1791 0xffff4c001250 INFO                    task gsttask.c:368:gst_task_func:<typefind:sink> Task going to paused
0:00:02.538843737  1791 0xffff4c001930 INFO         nvstreammux_ntp gstnvstreammux_ntp.cpp:352:gst_nvds_ntp_calculator_reset:<stream-muxer> Reset NTP calculations for source 2
0:00:02.543141493  1791 0xffff4c000ee0 INFO          GST_SCHEDULING gstpad.c:5030:gst_pad_get_range_unchecked:<source:src> getrange failed, flow: eos
0:00:02.543180341  1791 0xffff4c000ee0 INFO          GST_SCHEDULING gstpad.c:5245:gst_pad_pull_range:<decodebin1:sink> pullrange failed, flow: eos
0:00:02.543189974  1791 0xffff4c000ee0 INFO          GST_SCHEDULING gstpad.c:5030:gst_pad_get_range_unchecked:<sink:proxypad5> getrange failed, flow: eos
0:00:02.543191350  1791 0xffff4c001930 INFO               GST_EVENT gstevent.c:1689:gst_event_new_sink_message: creating sink-message event
0:00:02.543198518  1791 0xffff4c000ee0 INFO          GST_SCHEDULING gstpad.c:5245:gst_pad_pull_range:<typefind:sink> pullrange failed, flow: eos
nvstreammux: Successfully handled EOS for source_id=1
0:00:02.543333913  1791 0xffff4c000ee0 INFO                    task gsttask.c:368:gst_task_func:<typefind:sink> Task going to paused
0:00:02.543341081  1791 0xffff4c001930 INFO         nvstreammux_ntp gstnvstreammux_ntp.cpp:352:gst_nvds_ntp_calculator_reset:<stream-muxer> Reset NTP calculations for source 1
0:00:02.547402640  1791 0xffff4c0015c0 INFO          GST_SCHEDULING gstpad.c:5030:gst_pad_get_range_unchecked:<source:src> getrange failed, flow: eos
0:00:02.547411056  1791 0xffff4c001930 INFO               GST_EVENT gstevent.c:1689:gst_event_new_sink_message: creating sink-message event
0:00:02.547429616  1791 0xffff4c0015c0 INFO          GST_SCHEDULING gstpad.c:5245:gst_pad_pull_range:<decodebin3:sink> pullrange failed, flow: eos
0:00:02.547465329  1791 0xffff4c0015c0 INFO          GST_SCHEDULING gstpad.c:5030:gst_pad_get_range_unchecked:<sink:proxypad7> getrange failed, flow: eos
0:00:02.547475377  1791 0xffff4c0015c0 INFO          GST_SCHEDULING gstpad.c:5245:gst_pad_pull_range:<typefind:sink> pullrange failed, flow: eos
nvstreammux: Successfully handled EOS for source_id=3
0:00:02.568217357  1791 0xffff4c0015c0 INFO                    task gsttask.c:368:gst_task_func:<typefind:sink> Task going to paused
0:00:02.568272590  1791 0xffff4c001930 INFO         nvstreammux_ntp gstnvstreammux_ntp.cpp:352:gst_nvds_ntp_calculator_reset:<stream-muxer> Reset NTP calculations for source 3
0:00:02.568375184  1791 0xffff4c001930 INFO               GST_EVENT gstevent.c:1689:gst_event_new_sink_message: creating sink-message event
CUDA runtime error invalid argument at line 763 in file nvocdrlib_impl.cpp

I did not manage to find a solution for doing inference on images with small sizes with the config file, even if it works perfectly fine with the gst-launch command.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.