Bottleneck when using convert color with nvvideconvert and nvdsosd

Please provide complete information as applicable to your setup.

• Hardware Platform (GPU)
• DeepStream Version : 5.1
• TensorRT Version : 7.2
• NVIDIA GPU Driver Version (valid for GPU only) : 460.91.03
• Issue Type( questions, new requirements, bugs) : questions

I had a custom plugin with input cap video/x-raw(memory:NVMM) format: { (string)I420 }. this plugin after nvdsosd (with osd need input format support is video/x-raw(memory:NVMM) format: { (string)RGBA }. the input source from appsrc caps output video/x-raw, format NV12 data. and use nvvideconvert tovideo/x-raw(memory:NVMM), format NV12.

the problem is that the FPS really slow when using nvinfer (engine detector) provide the objects about 10FPS. if don’t have nvdsosd in the pipeline the FPS will be 100.

I have some questions please someone in Nvidia team could reply:

  1. is nvdsosd work only if have an object? since when I keep nvdsosd and remove nvinfer from pipeline the FPS will be 100 too. this time nvdsosd not do anything when no more object?

  2. I wonder this may be botteneck happen because I do nvvideoconvert too many times especially convert format NV12 to RGBA and then convert RGBA to I420?

Thank you!

See inline.

thank for your reply.

below is three pipeline measure the latency performance with difference pipeline with and without nvinfer (know as have the objects to make osd worked).

the osd did something make slow down the pipeline here when I add it to the pipeline (especially when have the object)!

-nvinfer + osd + myplugin 10FPS

************BATCH-NUM = 73**************
Comp name = src_bin_muxer source_id = 1 pad_index = 1 frame_num = 0               in_system_timestamp = 1631541221551.955078 out_system_timestamp = 1631541221552.047119               component_latency = 0.092041
Comp name = primary_gie in_system_timestamp = 1631541221552.117920 out_system_timestamp = 1631541221802.144043               component latency= 250.026123
Comp name = demuxer in_system_timestamp = 1631541221802.177979 out_system_timestamp = 1631541221802.210938               component latency= 0.032959
Comp name = osd_conv in_system_timestamp = 1631541221809.687012 out_system_timestamp = 1631541221901.725098               component latency= 92.038086
Comp name = nvosd0 in_system_timestamp = 1631541221901.798096 out_system_timestamp = 1631541221984.606934               component latency= 82.808838
Comp name = myplugin_conv0 in_system_timestamp = 1631541221985.652100 out_system_timestamp = 1631541221987.780029               component latency= 2.127930
Comp name = myplugin0 in_system_timestamp = 1631541221989.258057 out_system_timestamp = 1631541222022.007080               component latency= 32.749023
Source id = 1 Frame_num = 0 Frame latency = 1631541222022.141113 (ms)
  • osd + myplugin 100 FPS
************BATCH-NUM = 242**************
Comp name = src_bin_muxer source_id = 1 pad_index = 1 frame_num = 0               in_system_timestamp = 1631541329531.735107 out_system_timestamp = 1631541329531.777100               component_latency = 0.041992
base64 process [1] partition 1: size image 31620, in 12 ms
Comp name = demuxer in_system_timestamp = 1631541329531.783936 out_system_timestamp = 1631541329531.819092               component latency= 0.035156
Comp name = osd_conv in_system_timestamp = 1631541329531.864014 out_system_timestamp = 1631541329533.281982               component latency= 1.417969
Comp name = nvosd0 in_system_timestamp = 1631541329533.346924 out_system_timestamp = 1631541329533.355957               component latency= 0.009033
Comp name = myplugin_conv0 in_system_timestamp = 1631541329533.418945 out_system_timestamp = 1631541329533.530029               component latency= 0.111084
Comp name = myplugin0 in_system_timestamp = 1631541329534.218018 out_system_timestamp = 1631541329546.114990               component latency= 11.896973
Source id = 1 Frame_num = 0 Frame latency = 1631541329546.308105 (ms)
  • nvinfer+ myplugin 100 FPS
************BATCH-NUM = 286**************
Comp name = src_bin_muxer source_id = 1 pad_index = 1 frame_num = 0               in_system_timestamp = 1631541281244.716064 out_system_timestamp = 1631541281244.777100               component_latency = 0.061035
Comp name = primary_gie in_system_timestamp = 1631541281244.814941 out_system_timestamp = 1631541281256.207031               component latency= 11.392090
Comp name = demuxer in_system_timestamp = 1631541281256.333008 out_system_timestamp = 1631541281256.425049               component latency= 0.092041
Comp name = myplugin_conv0 in_system_timestamp = 1631541281292.562012 out_system_timestamp = 1631541281292.670898               component latency= 0.108887
Comp name = myplugin0 in_system_timestamp = 1631541281320.782959 out_system_timestamp = 1631541281334.439941               component latency= 13.656982
Source id = 1 Frame_num = 0 Frame latency = 1631541281335.407959 (ms)

in your first case, primary gie take around 250ms, while in the third case, primary gie take around 11ms, did you use same batch and same stream? any difference between the two except there no nvosd in the third case?

Hi amycao. Three tests using the same file config app and config of engines. The difference that I just disable the osd and primary engine in the main file config with the field enable=0/1. the source test input also the same

Please check on your side, it did not make sense if use same condition, but primary gie latency get huge different result.