Delay when I using RTSP camera

I am using deepstream_multistream_test app. I need to do post-processing in my model SGIE. The frames come late from RTSP camera when I make network-type==100. But when I put is-classifier=1 in config file the problem gone.

I tried all methods of rtsp troubleshooting on doc but not solving my problem

config file of sgie:

[property]
gpu-id=0
process-mode=2

#net-scale-factor=0.00329215686274
net-scale-factor=0.0189601459307
offsets=112.86182266638355;112.86182266638355;112.86182266638355

onnx-file=/home/jetson-nx/codes/models/facenet/agx_facenet_dynamic_model.onnx
model-engine-file=//home/jetson-nx/codes/models/facenet/agx_facenet_dynamic_model.onnx_b32_gpu0_fp32.engine
force-implicit-batch-dim=1
batch-size=32
# 0=FP32 and 1=INT8 2=FP16 mode 
network-mode=0

gie-unique-id=3
operate-on-gie-id=2
operate-on-class-ids=0


#output-blob-names=Bottleneck_BatchNorm/batchnorm_1/add_1:0
#is-classifier=1
#classifier-async-mode=0

network-type=100
model-color-format=1

output-tensor-meta=1
#scaling-filter=1
#scaling-compute-hw=0

#maintain-aspect-ratio=1
#secondary-reinfer-interval=16

main function:

def main(args):
	global start_time 

	# start_time = time()


	# Check input arguments
	# if len(args) < 2:
	# 	sys.stderr.write("usage: %s <uri1> [uri2] ... [uriN] <folder to save frames>\n" % args[0])
	# 	sys.exit(1)

	for i in range(0,len(args)-1):
		fps_streams["stream{0}".format(i)]=GETFPS(i)
	number_sources=len(args) -1

	print("number_sources ",number_sources)
	# global folder_name
	# folder_name=args[-1]
	# if path.exists(folder_name):
	# 	sys.stderr.write("The output folder %s already exists. Please remove it first.\n" % folder_name)
	# 	sys.exit(1)

	# os.mkdir(folder_name)
	# print("Frames will be saved in ",folder_name)
	# Standard GStreamer initialization
	GObject.threads_init()
	Gst.init(None)

	# Create gstreamer elements */
	# Create Pipeline element that will form a connection of other elements
	print("Creating Pipeline \n ")
	pipeline = Gst.Pipeline()
	is_live = False

	if not pipeline:
		sys.stderr.write(" Unable to create Pipeline \n")
	print("Creating streamux \n ")

	# Create nvstreammux instance to form batches from one or more sources.
	streammux = Gst.ElementFactory.make("nvstreammux", "Stream-muxer")
	if not streammux:
		sys.stderr.write(" Unable to create NvStreamMux \n")

	pipeline.add(streammux)
	for i in range(number_sources):
		# os.mkdir(folder_name+"/stream_"+str(i))
		# frame_count["stream_"+str(i)]=0
		# saved_count["stream_"+str(i)]=0
		print("Creating source_bin ",i," \n ")
		uri_name=args[i+1]
		if uri_name.find("rtsp://") == 0 :
			is_live = True
		source_bin=create_source_bin(i, uri_name)

		if not source_bin:
			sys.stderr.write("Unable to create source bin \n")
		
		pipeline.add(source_bin)
		padname="sink_%u" %i
		sinkpad= streammux.get_request_pad(padname) 
		if not sinkpad:
			sys.stderr.write("Unable to create sink pad bin \n")
		srcpad=source_bin.get_static_pad("src")
		if not srcpad:
			sys.stderr.write("Unable to create src pad bin \n")
		srcpad.link(sinkpad)
	print("Creating Pgie \n ")

	tracker = Gst.ElementFactory.make("nvtracker", "tracker")
	if not tracker:
		sys.stderr.write(" Unable to create tracker \n")

	# Use nvinfer to run inferencing on decoder's output,
	# behaviour of inferencing is set through config file
	face_detector = Gst.ElementFactory.make("nvinfer", "face-detector-inference")
	if not face_detector:
		sys.stderr.write(" Unable to create face_detector \n")

	face_recogniser = Gst.ElementFactory.make("nvinfer", "face-recogniser-inference")
	if not face_recogniser:
		sys.stderr.write(" Unable to create face_recogniser \n")

	# Add nvvidconv1 and filter1 to convert the frames to RGBA
	# which is easier to work with in Python.
	print("Creating nvvidconv1 \n ")
	nvvidconv1 = Gst.ElementFactory.make("nvvideoconvert", "convertor1")
	if not nvvidconv1:
		sys.stderr.write(" Unable to create nvvidconv1 \n")
	print("Creating filter1 \n ")
	caps1 = Gst.Caps.from_string("video/x-raw(memory:NVMM), format=RGBA")
	filter1 = Gst.ElementFactory.make("capsfilter", "filter1")
	if not filter1:
		sys.stderr.write(" Unable to get the caps filter1 \n")
	filter1.set_property("caps", caps1)
	print("Creating tiler \n ")
	tiler=Gst.ElementFactory.make("nvmultistreamtiler", "nvtiler")
	if not tiler:
		sys.stderr.write(" Unable to create tiler \n")
	print("Creating nvvidconv \n ")
	nvvidconv = Gst.ElementFactory.make("nvvideoconvert", "convertor")
	if not nvvidconv:
		sys.stderr.write(" Unable to create nvvidconv \n")
	print("Creating nvosd \n ")
	nvosd = Gst.ElementFactory.make("nvdsosd", "onscreendisplay")
	if not nvosd:
		sys.stderr.write(" Unable to create nvosd \n")
	if(is_aarch64()):
		print("Creating transform \n ")
		# transform=Gst.ElementFactory.make("nvegltransform", "nvegl-transform")
		transform = Gst.ElementFactory.make("queue", "queue")
		if not transform:
			sys.stderr.write(" Unable to create transform \n")

	print("Creating EGLSink \n")
	# sink = Gst.ElementFactory.make("nveglglessink", "nvvideo-renderer")
	sink = Gst.ElementFactory.make("nvoverlaysink", "nvvideo-renderer")
	# sink = Gst.ElementFactory.make("fakesink", "fakesink")
	if not sink:
		sys.stderr.write(" Unable to create egl sink \n")

	# queue1=Gst.ElementFactory.make("queue","queue1")
	# queue2=Gst.ElementFactory.make("queue","queue2")
	# queue3=Gst.ElementFactory.make("queue","queue3")
	
	if is_live:
		print("Atleast one of the sources is live")
		streammux.set_property('live-source', 1)

	streammux.set_property('width', 1920)
	streammux.set_property('height', 1080)
	streammux.set_property('batch-size', number_sources)
	streammux.set_property('batched-push-timeout', 1066666)
	
	face_recogniser.set_property('config-file-path', "face_recogniser_config.txt")
	face_detector.set_property('config-file-path', "face_detector_config.txt")
	pgie_batch_size=face_detector.get_property("batch-size")
	
	if(pgie_batch_size != number_sources):
		print("WARNING: Overriding infer-config batch-size",pgie_batch_size," with number of sources ", number_sources," \n")
		face_detector.set_property("batch-size",number_sources)
	tiler_rows=int(math.sqrt(number_sources))
	tiler_columns=int(math.ceil((1.0*number_sources)/tiler_rows))
	tiler.set_property("rows",tiler_rows)
	tiler.set_property("columns",tiler_columns)
	tiler.set_property("width", TILED_OUTPUT_WIDTH)
	tiler.set_property("height", TILED_OUTPUT_HEIGHT)

	sink.set_property("sync", 0)
	# sink.set_property("async", 0)
	

	if not is_aarch64():
		# Use CUDA unified memory in the pipeline so frames
		# can be easily accessed on CPU in Python.
		mem_type = int(pyds.NVBUF_MEM_CUDA_UNIFIED)
		streammux.set_property("nvbuf-memory-type", mem_type)
		nvvidconv.set_property("nvbuf-memory-type", mem_type)
		nvvidconv1.set_property("nvbuf-memory-type", mem_type)
		tiler.set_property("nvbuf-memory-type", mem_type)
	
	#Set properties of tracker
	config = configparser.ConfigParser()
	config.read('dstest2_tracker_config.txt')
	config.sections()

	for key in config['tracker']:
		if key == 'tracker-width' :
			tracker_width = config.getint('tracker', key)
			tracker.set_property('tracker-width', tracker_width)
		if key == 'tracker-height' :
			tracker_height = config.getint('tracker', key)
			tracker.set_property('tracker-height', tracker_height)
		if key == 'gpu-id' :
			tracker_gpu_id = config.getint('tracker', key)
			tracker.set_property('gpu_id', tracker_gpu_id)
		if key == 'll-lib-file' :
			tracker_ll_lib_file = config.get('tracker', key)
			tracker.set_property('ll-lib-file', tracker_ll_lib_file)
		if key == 'll-config-file' :
			tracker_ll_config_file = config.get('tracker', key)
			tracker.set_property('ll-config-file', tracker_ll_config_file)
		if key == 'enable-batch-process' :
			tracker_enable_batch_process = config.getint('tracker', key)
			tracker.set_property('enable_batch_process', tracker_enable_batch_process)

			

	print("Adding elements to Pipeline \n")
	pipeline.add(tracker)
	pipeline.add(face_detector)
	pipeline.add(face_recogniser)
	pipeline.add(tiler)
	pipeline.add(nvvidconv)
	pipeline.add(filter1)
	pipeline.add(nvvidconv1)
	pipeline.add(nvosd)
	if is_aarch64():
		pipeline.add(transform)
	pipeline.add(sink)

	print("Linking elements in the Pipeline \n")
	streammux.link(face_detector) 
	face_detector.link(tracker)
	tracker.link(nvvidconv1)
	nvvidconv1.link(filter1)
	filter1.link(face_recogniser)
	# queue3.link(face_recogniser)
	face_recogniser.link(tiler)
	tiler.link(nvvidconv)
	nvvidconv.link(nvosd)
	# queue2.link(nvosd)
	if is_aarch64():
		nvosd.link(transform)
		# queue2.link(transform)
		transform.link(sink)
		# queue1.link(sink)
	else:
		nvosd.link(sink)
		# queue1.link(sink)

	# create an event loop and feed gstreamer bus mesages to it
	loop = GObject.MainLoop()
	bus = pipeline.get_bus()
	bus.add_signal_watch()
	bus.connect ("message", bus_call, loop)

	# tiler_sink_pad=nvvidconv1.get_static_pad("sink")
	# if not tiler_sink_pad:
	# 	sys.stderr.write(" Unable to get src pad \n")
	# else:
	# 	tiler_sink_pad.add_probe(Gst.PadProbeType.BUFFER, tiler_sink_pad_buffer_probe, 0)

	vidconvsinkpad = face_recogniser.get_static_pad("src")
	if not vidconvsinkpad:
		sys.stderr.write(" Unable to get sink pad of nvvidconv \n")

	vidconvsinkpad.add_probe(Gst.PadProbeType.BUFFER, sgie_sink_pad_buffer_probe, 0)


	# List the sources
	print("Now playing...")
	for i, source in enumerate(args[:-1]):
		if (i != 0):
			print(i, ": ", source)

	print("Starting pipeline \n")
	# start play back and listed to events		
	pipeline.set_state(Gst.State.PLAYING)
	try:
		loop.run()
	except:
		pass
	# cleanup
	print("Exiting app\n")
	pipeline.set_state(Gst.State.NULL)

• Hardware PlatformNX
• DeepStream Version 5.0.1
• JetPack Version 4.5

What do you mean by “The frames come late”?

when the model detect more than three faces, i get huge delays, and gradually it increases more, if it detects more faces.

When you set network-type==100, have you processed the model output by yourself?

yes

Have measured your postprocessing time? The probe function will block the pipeline before the processing finished.

If there is 7 people in the frame, it will take time 0.07 second.

0.07s = 70ms.

1s/(70ms/frame) = 14.28 fps

So?

Is 14.28 fps slow or fast for you?

It is good, but I need to solve the delay problem

If the source fps is larger than 14.28fps, the delay will be there. What is your source? Local file, rtsp stream or camera?

rtsp stream. even if I minimize the fps of the rtsp, the problem will be same.
Also, when I disable the probe, the delay gone

What do you mean? What is the minimal fps of your rtsp?

The probe function is a blocking callback, it will hold the whole pipeline, so the processing inside probe function should be as fast as possible.

I put fps of rtsp = 6

What is the latency of the display?

I tried to measure NVDS_ENABLE_LATENCY_MEASUREMENT=1 but nothing measured

Please refer to DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

The delay is caused by the post-processing. You need to optimize the implementation to make it faster.

Is there a python code to measure latency?
Could I put The post-processing on the thread? If yes, how I can do it with deepstream?

No.

Yes. But if the processing time is not fast enough it will delay the video too.