Bottleneck when using convert color with nvvideconvert and nvdsosd

trild-vietnam · September 12, 2021, 2:37pm

Please provide complete information as applicable to your setup.

• Hardware Platform (GPU)
• DeepStream Version : 5.1
• TensorRT Version : 7.2
• NVIDIA GPU Driver Version (valid for GPU only) : 460.91.03
• Issue Type( questions, new requirements, bugs) : questions

I had a custom plugin with input cap video/x-raw(memory:NVMM) format: { (string)I420 }. this plugin after nvdsosd (with osd need input format support is video/x-raw(memory:NVMM) format: { (string)RGBA }. the input source from appsrc caps output video/x-raw, format NV12 data. and use nvvideconvert tovideo/x-raw(memory:NVMM), format NV12.

the problem is that the FPS really slow when using nvinfer (engine detector) provide the objects about 10FPS. if don’t have nvdsosd in the pipeline the FPS will be 100.

I have some questions please someone in Nvidia team could reply:

is nvdsosd work only if have an object? since when I keep nvdsosd and remove nvinfer from pipeline the FPS will be 100 too. this time nvdsosd not do anything when no more object?
I wonder this may be botteneck happen because I do nvvideoconvert too many times especially convert format NV12 to RGBA and then convert RGBA to I420?

Thank you!

Amycao · September 13, 2021, 10:55am

trild-vietnam:

is nvdsosd work only if have an object? since when I keep nvdsosd and remove nvinfer from pipeline the FPS will be 100 too. this time nvdsosd not do anything when no more object?

[amyc] osd plugin get rgba buffer with attached metadata from upcomponents, it will draw based on the metadata. like bounding box, text, etc… the metadata attached from upcomponents, as it have data, it will draw on the frame. i do not think osd take a large percent on the performance. you can measure components latency to see which caused the perf low. check this, Troubleshooting — DeepStream 6.3 Release documentation, The DeepStream application is running slowly
if you not use deepstream-app, you can refer this faq to add components perf measurement, DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums [DS 5.0GA_All_App] Enable Perf measurement(FPS) for deepstream sample apps

I wonder this may be botteneck happen because I do nvvideoconvert too many times especially convert format NV12 to RGBA and then convert RGBA to I420?

See inline.

trild-vietnam · September 13, 2021, 1:59pm

thank for your reply.

below is three pipeline measure the latency performance with difference pipeline with and without nvinfer (know as have the objects to make osd worked).

the osd did something make slow down the pipeline here when I add it to the pipeline (especially when have the object)!

-nvinfer + osd + myplugin 10FPS

************BATCH-NUM = 73**************
Comp name = src_bin_muxer source_id = 1 pad_index = 1 frame_num = 0               in_system_timestamp = 1631541221551.955078 out_system_timestamp = 1631541221552.047119               component_latency = 0.092041
Comp name = primary_gie in_system_timestamp = 1631541221552.117920 out_system_timestamp = 1631541221802.144043               component latency= 250.026123
Comp name = demuxer in_system_timestamp = 1631541221802.177979 out_system_timestamp = 1631541221802.210938               component latency= 0.032959
Comp name = osd_conv in_system_timestamp = 1631541221809.687012 out_system_timestamp = 1631541221901.725098               component latency= 92.038086
Comp name = nvosd0 in_system_timestamp = 1631541221901.798096 out_system_timestamp = 1631541221984.606934               component latency= 82.808838
Comp name = myplugin_conv0 in_system_timestamp = 1631541221985.652100 out_system_timestamp = 1631541221987.780029               component latency= 2.127930
Comp name = myplugin0 in_system_timestamp = 1631541221989.258057 out_system_timestamp = 1631541222022.007080               component latency= 32.749023
Source id = 1 Frame_num = 0 Frame latency = 1631541222022.141113 (ms)

osd + myplugin 100 FPS

************BATCH-NUM = 242**************
Comp name = src_bin_muxer source_id = 1 pad_index = 1 frame_num = 0               in_system_timestamp = 1631541329531.735107 out_system_timestamp = 1631541329531.777100               component_latency = 0.041992
base64 process [1] partition 1: size image 31620, in 12 ms
Comp name = demuxer in_system_timestamp = 1631541329531.783936 out_system_timestamp = 1631541329531.819092               component latency= 0.035156
Comp name = osd_conv in_system_timestamp = 1631541329531.864014 out_system_timestamp = 1631541329533.281982               component latency= 1.417969
Comp name = nvosd0 in_system_timestamp = 1631541329533.346924 out_system_timestamp = 1631541329533.355957               component latency= 0.009033
Comp name = myplugin_conv0 in_system_timestamp = 1631541329533.418945 out_system_timestamp = 1631541329533.530029               component latency= 0.111084
Comp name = myplugin0 in_system_timestamp = 1631541329534.218018 out_system_timestamp = 1631541329546.114990               component latency= 11.896973
Source id = 1 Frame_num = 0 Frame latency = 1631541329546.308105 (ms)

nvinfer+ myplugin 100 FPS

************BATCH-NUM = 286**************
Comp name = src_bin_muxer source_id = 1 pad_index = 1 frame_num = 0               in_system_timestamp = 1631541281244.716064 out_system_timestamp = 1631541281244.777100               component_latency = 0.061035
Comp name = primary_gie in_system_timestamp = 1631541281244.814941 out_system_timestamp = 1631541281256.207031               component latency= 11.392090
Comp name = demuxer in_system_timestamp = 1631541281256.333008 out_system_timestamp = 1631541281256.425049               component latency= 0.092041
Comp name = myplugin_conv0 in_system_timestamp = 1631541281292.562012 out_system_timestamp = 1631541281292.670898               component latency= 0.108887
Comp name = myplugin0 in_system_timestamp = 1631541281320.782959 out_system_timestamp = 1631541281334.439941               component latency= 13.656982
Source id = 1 Frame_num = 0 Frame latency = 1631541281335.407959 (ms)

Amycao · September 14, 2021, 9:35am

in your first case, primary gie take around 250ms, while in the third case, primary gie take around 11ms, did you use same batch and same stream? any difference between the two except there no nvosd in the third case?

trild-vietnam · September 14, 2021, 9:38am

Hi amycao. Three tests using the same file config app and config of engines. The difference that I just disable the osd and primary engine in the main file config with the field enable=0/1. the source test input also the same

Amycao · September 15, 2021, 5:41am

Please check on your side, it did not make sense if use same condition, but primary gie latency get huge different result.

Amycao · September 23, 2021, 1:24am

We can not repro your issue, in our enviroments, the fps with and without nvosd differ around 2-3, can you provide the configuration used and extract your app so that can run in nvidia environments for us to repro your issue?

trild-vietnam · September 23, 2021, 1:50am

Thank for your feedback.

I tried these commands gst-launch and check the performance but this source is file source read from video file no more information. that is the pipeline with osd but with/without convert color I420 (test 0-2) the pipeline still got the time execude in 28-29s. In-time the pipeline no osd and with/without my plugin (test 3-4) just take 2s. Do you have any idea about this?

[test 0] gst-launch-1.0 filesrc location="/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4" ! decodebin ! .sink_1 nvstreammux batch-size=1 width=1280 height=720 ! nvinfer config-file-path="/opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app/config_infer_primary.txt" ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! nvdsosd ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=I420' ! myplugin ! fakesink
→ finished in 29s

[test 1] gst-launch-1.0 filesrc location="/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4" ! decodebin ! .sink_1 nvstreammux batch-size=1 width=1280 height=720 ! nvinfer config-file-path="/opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app/config_infer_primary.txt" ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! nvdsosd ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=I420' ! fakesink
→ finished in 28s

[test 2] gst-launch-1.0 filesrc location="/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4" ! decodebin ! .sink_1 nvstreammux batch-size=1 width=1280 height=720 ! nvinfer config-file-path="/opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app/config_infer_primary.txt" ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! nvdsosd ! fakesink
→ finished in 28s

[test 3] gst-launch-1.0 filesrc location="/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4" ! decodebin ! .sink_1 nvstreammux batch-size=1 width=1280 height=720 ! nvinfer config-file-path="/opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app/config_infer_primary.txt" ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=I420' ! myplugin ! fakesink
→ finished in 2s

[test 4] gst-launch-1.0 filesrc location="/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4" ! decodebin ! .sink_1 nvstreammux batch-size=1 width=1280 height=720 ! nvinfer config-file-path="/opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app/config_infer_primary.txt" ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=I420' ! fakesink
→ finished in 2s

trild-vietnam · September 23, 2021, 2:10am

I also test the pipeline no nvinfer and tried to remove myplugin in the pipeline. and got 2s exec.

[test 5] gst-launch-1.0 filesrc location="/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4" ! decodebin ! .sink_1 nvstreammux batch-size=1 width=1280 height=720 ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! nvdsosd ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=I420' ! myplugin ! fakesink
→ finished in 2s

[test 6] gst-launch-1.0 filesrc location="/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4" ! decodebin ! .sink_1 nvstreammux batch-size=1 width=1280 height=720 ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=RGBA' ! nvdsosd ! nvvideoconvert ! 'video/x-raw(memory:NVMM),format=I420' ! fakesink
→ finished in 2s

Amycao · September 23, 2021, 2:12pm

I run on my side, but it only take around 2.47s. I used T4 card, and boost GPU frequency to max, which GPU you are using, did you boost the GPU freq? and in the nvinfer config file, did you use builtin model? or your model? and how about your CPU model? this is our CPU model: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz 24 cores.

root@148e1ebe1354:/opt/nvidia/deepstream/deepstream-5.1# gst-launch-1.0 filesrc location=“/opt/nvidia/deepstream/deepstream-5.1/samples/streams/sample_1080p_h264.mp4” ! decodebin ! .sink_1 nvstreammux batch-size=1 width=1280 height=720 ! nvinfer config-file-path=“/opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app/config_infer_primary.txt” ! nvvideoconvert ! ‘video/x-raw(memory:NVMM),format=RGBA’ ! nvdsosd ! fakesink
Setting pipeline to PAUSED …
0:00:12.310205495 1012 0x564a95af3870 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1702> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-5.1/samples/models/Primary_Detector/resnet10.caffemodel_b1_gpu0_int8.engine
INFO: …/nvdsinfer/nvdsinfer_model_builder.cpp:685 [Implicit Engine Info]: layers num: 3
0 INPUT kFLOAT input_1 3x368x640
1 OUTPUT kFLOAT conv2d_bbox 16x23x40
2 OUTPUT kFLOAT conv2d_cov/Sigmoid 4x23x40

0:00:12.310309306 1012 0x564a95af3870 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1806> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-5.1/samples/models/Primary_Detector/resnet10.caffemodel_b1_gpu0_int8.engine
0:00:12.311520438 1012 0x564a95af3870 INFO nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus: [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app/config_infer_primary.txt sucessfully
Pipeline is PREROLLING …
Pipeline is PREROLLED …
Setting pipeline to PLAYING …
New clock: GstSystemClock
Got EOS from element “pipeline0”.
Execution ended after 0:00:02.473655020
Setting pipeline to PAUSED …
Setting pipeline to READY …
Setting pipeline to NULL …
Freeing pipeline …

trild-vietnam · September 23, 2021, 4:24pm

Thank you amycao! The config model I used that is defautl in samples folder. I used 1080 Ti GPU. How to could I boost the GPU freq?

Could you do a test pipeline with the same 1080 GPU? I want to check is a problem with my GPU card or the nvosd not work the best performance on this card.

Amycao · September 28, 2021, 1:48am

root@c7ce51278696:/opt/nvidia/deepstream/deepstream-6.0# gst-launch-1.0 filesrc location=“/opt/nvidia/deepstream/deepstream-6.0/samples/streams/sample_1080p_h264.mp4” ! decodebin ! .sink_1 nvstreammux batch-size=1 width=1280 height=720 ! nvinfer config-file-path=“/opt/nvidia/deepstream/deepstream-6.0/samples/configs/deepstream-app/config_infer_primary.txt” ! nvvideoconvert ! ‘video/x-raw(memory:NVMM),format=RGBA’ ! nvdsosd ! fakesink
Setting pipeline to PAUSED …
0:00:00.753062961 75 0x55e75f1ae240 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() <nvdsinfer_context_impl.cpp:1804> [UID = 1]: deserialized trt engine from :/opt/nvidia/deepstream/deepstream-6.0/samples/models/Primary_Detector/resnet10.caffemodel_b1_gpu0_int8.engine
INFO: …/nvdsinfer/nvdsinfer_model_builder.cpp:685 [Implicit Engine Info]: layers num: 3
0 INPUT kFLOAT input_1 3x368x640
1 OUTPUT kFLOAT conv2d_bbox 16x23x40
2 OUTPUT kFLOAT conv2d_cov/Sigmoid 4x23x40

0:00:00.753143039 75 0x55e75f1ae240 INFO nvinfer gstnvinfer.cpp:619:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() <nvdsinfer_context_impl.cpp:1908> [UID = 1]: Use deserialized engine model: /opt/nvidia/deepstream/deepstream-6.0/samples/models/Primary_Detector/resnet10.caffemodel_b1_gpu0_int8.engine
0:00:00.755295270 75 0x55e75f1ae240 INFO nvinfer gstnvinfer_impl.cpp:313:notifyLoadModelStatus: [UID 1]: Load new model:/opt/nvidia/deepstream/deepstream-6.0/samples/configs/deepstream-app/config_infer_primary.txt sucessfully
Pipeline is PREROLLING …
Pipeline is PREROLLED …
Setting pipeline to PLAYING …
New clock: GstSystemClock
Got EOS from element “pipeline0”.
Execution ended after 0:00:02.059874230
Setting pipeline to PAUSED …
Setting pipeline to READY …
Setting pipeline to NULL …
Freeing pipeline …

Topic		Replies	Views
Significant slowdown after DeepStream v6.2 DeepStream SDK	11	171	July 23, 2024
Deepstream_test_1.py doesn`t work DeepStream SDK	23	1579	December 12, 2022
Can not run deepstream_test_1.py? DeepStream SDK	8	330	July 31, 2023
Deepstream python app for MJPG stream DeepStream SDK gstreamer	9	2412	September 7, 2023
Unknown key 'blur-objects' for group [ds-example] DeepStream SDK	4	968	October 12, 2021
Run BACK-TO-BACK-DETECTORS REFERENCE APP under DeepStream SDK 5.0 DeepStream SDK	16	997	October 12, 2021
How to use software encoder for deepstream apps DeepStream SDK	3	600	October 13, 2023
Jpeg to nvinfer to nvosd to rects to jpeg on Jetson Deepstream 6.4 DeepStream SDK deepstream	12	54	October 8, 2024
Deepstream 6.1: deepstream-app not working after install DeepStream SDK deepstream61	9	2040	September 5, 2022
Python in DeepStream: error {Internal data stream error} while running deepstream-test1 DeepStream SDK	28	13215	October 5, 2021

Bottleneck when using convert color with nvvideconvert and nvdsosd

Related topics