Bottleneck when using convert color with nvvideconvert and nvdsosd

Please provide complete information as applicable to your setup.

• Hardware Platform (GPU)
• DeepStream Version : 5.1
• TensorRT Version : 7.2
• NVIDIA GPU Driver Version (valid for GPU only) : 460.91.03
• Issue Type( questions, new requirements, bugs) : questions

I had a custom plugin with input cap video/x-raw(memory:NVMM) format: { (string)I420 }. this plugin after nvdsosd (with osd need input format support is video/x-raw(memory:NVMM) format: { (string)RGBA }. the input source from appsrc caps output video/x-raw, format NV12 data. and use nvvideconvert tovideo/x-raw(memory:NVMM), format NV12.

the problem is that the FPS really slow when using nvinfer (engine detector) provide the objects about 10FPS. if don’t have nvdsosd in the pipeline the FPS will be 100.

I have some questions please someone in Nvidia team could reply:

  1. is nvdsosd work only if have an object? since when I keep nvdsosd and remove nvinfer from pipeline the FPS will be 100 too. this time nvdsosd not do anything when no more object?

  2. I wonder this may be botteneck happen because I do nvvideoconvert too many times especially convert format NV12 to RGBA and then convert RGBA to I420?

Thank you!

See inline.

thank for your reply.

below is three pipeline measure the latency performance with difference pipeline with and without nvinfer (know as have the objects to make osd worked).

the osd did something make slow down the pipeline here when I add it to the pipeline (especially when have the object)!

-nvinfer + osd + myplugin 10FPS

************BATCH-NUM = 73**************
Comp name = src_bin_muxer source_id = 1 pad_index = 1 frame_num = 0               in_system_timestamp = 1631541221551.955078 out_system_timestamp = 1631541221552.047119               component_latency = 0.092041
Comp name = primary_gie in_system_timestamp = 1631541221552.117920 out_system_timestamp = 1631541221802.144043               component latency= 250.026123
Comp name = demuxer in_system_timestamp = 1631541221802.177979 out_system_timestamp = 1631541221802.210938               component latency= 0.032959
Comp name = osd_conv in_system_timestamp = 1631541221809.687012 out_system_timestamp = 1631541221901.725098               component latency= 92.038086
Comp name = nvosd0 in_system_timestamp = 1631541221901.798096 out_system_timestamp = 1631541221984.606934               component latency= 82.808838
Comp name = myplugin_conv0 in_system_timestamp = 1631541221985.652100 out_system_timestamp = 1631541221987.780029               component latency= 2.127930
Comp name = myplugin0 in_system_timestamp = 1631541221989.258057 out_system_timestamp = 1631541222022.007080               component latency= 32.749023
Source id = 1 Frame_num = 0 Frame latency = 1631541222022.141113 (ms)
  • osd + myplugin 100 FPS
************BATCH-NUM = 242**************
Comp name = src_bin_muxer source_id = 1 pad_index = 1 frame_num = 0               in_system_timestamp = 1631541329531.735107 out_system_timestamp = 1631541329531.777100               component_latency = 0.041992
base64 process [1] partition 1: size image 31620, in 12 ms
Comp name = demuxer in_system_timestamp = 1631541329531.783936 out_system_timestamp = 1631541329531.819092               component latency= 0.035156
Comp name = osd_conv in_system_timestamp = 1631541329531.864014 out_system_timestamp = 1631541329533.281982               component latency= 1.417969
Comp name = nvosd0 in_system_timestamp = 1631541329533.346924 out_system_timestamp = 1631541329533.355957               component latency= 0.009033
Comp name = myplugin_conv0 in_system_timestamp = 1631541329533.418945 out_system_timestamp = 1631541329533.530029               component latency= 0.111084
Comp name = myplugin0 in_system_timestamp = 1631541329534.218018 out_system_timestamp = 1631541329546.114990               component latency= 11.896973
Source id = 1 Frame_num = 0 Frame latency = 1631541329546.308105 (ms)
  • nvinfer+ myplugin 100 FPS
************BATCH-NUM = 286**************
Comp name = src_bin_muxer source_id = 1 pad_index = 1 frame_num = 0               in_system_timestamp = 1631541281244.716064 out_system_timestamp = 1631541281244.777100               component_latency = 0.061035
Comp name = primary_gie in_system_timestamp = 1631541281244.814941 out_system_timestamp = 1631541281256.207031               component latency= 11.392090
Comp name = demuxer in_system_timestamp = 1631541281256.333008 out_system_timestamp = 1631541281256.425049               component latency= 0.092041
Comp name = myplugin_conv0 in_system_timestamp = 1631541281292.562012 out_system_timestamp = 1631541281292.670898               component latency= 0.108887
Comp name = myplugin0 in_system_timestamp = 1631541281320.782959 out_system_timestamp = 1631541281334.439941               component latency= 13.656982
Source id = 1 Frame_num = 0 Frame latency = 1631541281335.407959 (ms)

in your first case, primary gie take around 250ms, while in the third case, primary gie take around 11ms, did you use same batch and same stream? any difference between the two except there no nvosd in the third case?

Hi amycao. Three tests using the same file config app and config of engines. The difference that I just disable the osd and primary engine in the main file config with the field enable=0/1. the source test input also the same

Please check on your side, it did not make sense if use same condition, but primary gie latency get huge different result.

We can not repro your issue, in our enviroments, the fps with and without nvosd differ around 2-3, can you provide the configuration used and extract your app so that can run in nvidia environments for us to repro your issue?

Thank for your feedback.

I tried these commands gst-launch and check the performance but this source is file source read from video file no more information. that is the pipeline with osd but with/without convert color I420 (test 0-2) the pipeline still got the time execude in 28-29s. In-time the pipeline no osd and with/without my plugin (test 3-4) just take 2s. Do you have any idea about this?

[test 0] gst-launch-1.0 filesrc location="/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4" ! decodebin ! .sink_1 nvstreammux batch-size=1 width=1280 height=720 ! nvinfer config-file-path="/opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app/config_infer_primary.txt" ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! nvdsosd ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=I420' ! myplugin ! fakesink
→ finished in 29s

[test 1] gst-launch-1.0 filesrc location="/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4" ! decodebin ! .sink_1 nvstreammux batch-size=1 width=1280 height=720 ! nvinfer config-file-path="/opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app/config_infer_primary.txt" ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! nvdsosd ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=I420' ! fakesink
→ finished in 28s

[test 2] gst-launch-1.0 filesrc location="/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4" ! decodebin ! .sink_1 nvstreammux batch-size=1 width=1280 height=720 ! nvinfer config-file-path="/opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app/config_infer_primary.txt" ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! nvdsosd ! fakesink
→ finished in 28s

[test 3] gst-launch-1.0 filesrc location="/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4" ! decodebin ! .sink_1 nvstreammux batch-size=1 width=1280 height=720 ! nvinfer config-file-path="/opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app/config_infer_primary.txt" ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=I420' ! myplugin ! fakesink
→ finished in 2s

[test 4] gst-launch-1.0 filesrc location="/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4" ! decodebin ! .sink_1 nvstreammux batch-size=1 width=1280 height=720 ! nvinfer config-file-path="/opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app/config_infer_primary.txt" ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=I420' ! fakesink
→ finished in 2s

I also test the pipeline no nvinfer and tried to remove myplugin in the pipeline. and got 2s exec.

[test 5] gst-launch-1.0 filesrc location="/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4" ! decodebin ! .sink_1 nvstreammux batch-size=1 width=1280 height=720 ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! nvdsosd ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=I420' ! myplugin ! fakesink
→ finished in 2s

[test 6] gst-launch-1.0 filesrc location="/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4" ! decodebin ! .sink_1 nvstreammux batch-size=1 width=1280 height=720 ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! nvdsosd ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=I420' ! fakesink
→ finished in 2s

I run on my side, but it only take around 2.47s. I used T4 card, and boost GPU frequency to max, which GPU you are using, did you boost the GPU freq? and in the nvinfer config file, did you use builtin model? or your model? and how about your CPU model? this is our CPU model: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz 24 cores.

root@148e1ebe1354:/opt/nvidia/deepstream/deepstream-5.1# gst-launch-1.0 filesrc location="/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4" ! decodebin ! .sink_1 nvstreammux batch-size=1 width=1280 height=720 ! nvinfer config-file-path="/opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app/config_infer_primary.txt" ! nvvideoconvert ! ‘video/x-raw(memory:NVMM),format=RGBA’ ! nvdsosd ! fakesink
Setting pipeline to PAUSED …
0:00:12.310205495 1012 0x564a95af3870 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1702> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-5.1/samples/models/Primary_Detector/resnet10.caffemodel_b1_gpu0_int8.engine
INFO: …/nvdsinfer/nvdsinfer_model_builder.cpp:685 [Implicit Engine Info]: layers num: 3
0 INPUT kFLOAT input_1 3x368x640
1 OUTPUT kFLOAT conv2d_bbox 16x23x40
2 OUTPUT kFLOAT conv2d_cov/Sigmoid 4x23x40

0:00:12.310309306 1012 0x564a95af3870 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1806> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-5.1/samples/models/Primary_Detector/resnet10.caffemodel_b1_gpu0_int8.engine
0:00:12.311520438 1012 0x564a95af3870 INFO nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus: [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app/config_infer_primary.txt sucessfully
Pipeline is PREROLLING …
Pipeline is PREROLLED …
Setting pipeline to PLAYING …
New clock: GstSystemClock
Got EOS from element “pipeline0”.
Execution ended after 0:00:02.473655020
Setting pipeline to PAUSED …
Setting pipeline to READY …
Setting pipeline to NULL …
Freeing pipeline …

Thank you amycao! The config model I used that is defautl in samples folder. I used 1080 Ti GPU. How to could I boost the GPU freq?

Could you do a test pipeline with the same 1080 GPU? I want to check is a problem with my GPU card or the nvosd not work the best performance on this card.

Run on serer with Geforce 1080, got similar result, i even did not boost the GPU freq on this server.
Server CPU: Intel(R) Core™ i7-4770K CPU @ 3.50GHz 8 cores
GPU:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 On | 00000000:01:00.0 Off | N/A |
| 33% 44C P8 11W / 180W | 0MiB / 8116MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

root@c7ce51278696:/opt/nvidia/deepstream/deepstream-6.0# gst-launch-1.0 filesrc location="/opt/nvidia/deepstream/deepstream-6.0/samples/streams/sample_1080p_h264.mp4" ! decodebin ! .sink_1 nvstreammux batch-size=1 width=1280 height=720 ! nvinfer config-file-path="/opt/nvidia/deepstream/deepstream-6.0/samples/configs/deepstream-app/config_infer_primary.txt" ! nvvideoconvert ! ‘video/x-raw(memory:NVMM),format=RGBA’ ! nvdsosd ! fakesink
Setting pipeline to PAUSED …
0:00:00.753062961 75 0x55e75f1ae240 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1804> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-6.0/samples/models/Primary_Detector/resnet10.caffemodel_b1_gpu0_int8.engine
INFO: …/nvdsinfer/nvdsinfer_model_builder.cpp:685 [Implicit Engine Info]: layers num: 3
0 INPUT kFLOAT input_1 3x368x640
1 OUTPUT kFLOAT conv2d_bbox 16x23x40
2 OUTPUT kFLOAT conv2d_cov/Sigmoid 4x23x40

0:00:00.753143039 75 0x55e75f1ae240 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1908> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-6.0/samples/models/Primary_Detector/resnet10.caffemodel_b1_gpu0_int8.engine
0:00:00.755295270 75 0x55e75f1ae240 INFO nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus: [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-6.0/samples/configs/deepstream-app/config_infer_primary.txt sucessfully
Pipeline is PREROLLING …
Pipeline is PREROLLED …
Setting pipeline to PLAYING …
New clock: GstSystemClock
Got EOS from element “pipeline0”.
Execution ended after 0:00:02.059874230
Setting pipeline to PAUSED …
Setting pipeline to READY …
Setting pipeline to NULL …
Freeing pipeline …