DeepStream RTP decoding issues

• Hardware Platform (GPU RTX 2080)
• DeepStream SDK 5.1
• JetPack Version (valid for Jetson only)
• TensorRT Version 7.2.2 + CUDA 11.2
• NVIDIA GPU Driver Version (460.32.03)
• Issue :could not decode and show the Bosch camera RTP source

Hi, every one, I am trying to use DeepStream to accelerate the decoding process of the RTP video source. I’ve made a simple pipeline to do decoing and show the video on the display:

gst-launch-1.0 -e udpsrc address=239.33.36.13 port=37004 caps = “application/x-rtp, media=(string)video,payload-type=(int)96, clock-rate=(int)90000, encoding-name=(string)H264” ! rtpjitterbuffer mode=4 ! rtph264depay ! h264parse ! nvv4l2decoder drop-frame-interval=1 ! nvvideoconvert ! video/x-raw, width=1080,height=720,format=NV12 ! nveglglessink window-x=0 window-y=0 window-width=1080 window-height=720

So the pipeline shown above with the source 239.33.36.13:37000 works well , the source is decoded and I could immediately see the it shown on the display. Then I changed the RTP source, which was taken from anthor Bosch camera:

gst-launch-1.0 -e udpsrc address=239.38.164.18 port=60160 caps = “application/x-rtp, media=(string)video,payload-type=(int)96, clock-rate=(int)90000, encoding-name=(string)H264” ! rtpjitterbuffer mode=4 ! rtph264depay ! h264parse ! nvv4l2decoder drop-frame-interval=1 ! nvvideoconvert ! video/x-raw, width=1080,height=720,format=NV12 ! nveglglessink window-x=0 window-y=0 window-width=1080 window-height=720

And this time the pipeline was stopped in the step “New clock: GstSystemClock”, I’ve also add a GST_DEBUG=3 at the head of my command, and I found some informations like:

0:00:02.961768115 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1349:gst_h264_parse_handle_frame: broken/invalid nal Type: 1 Slice, Size: 199 will be dropped
0:00:03.002743235 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1349:gst_h264_parse_handle_frame: broken/invalid nal Type: 6 SEI, Size: 88 will be dropped
0:00:03.002784101 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1349:gst_h264_parse_handle_frame: broken/invalid nal Type: 1 Slice, Size: 1017 will be dropped
0:00:03.043630858 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1349:gst_h264_parse_handle_frame: broken/invalid nal Type: 6 SEI, Size: 88 will be dropped
0:00:03.043671308 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1349:gst_h264_parse_handle_frame: broken/invalid nal Type: 1 Slice, Size: 197 will be dropped
0:00:03.084440340 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1349:gst_h264_parse_handle_frame: broken/invalid nal Type: 6 SEI, Size: 88 will be dropped
0:00:03.084482243 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1349:gst_h264_parse_handle_frame: broken/invalid nal Type: 1 Slice, Size: 20 will be dropped
0:00:03.084502638 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1349:gst_h264_parse_handle_frame: broken/invalid nal Type: 0 Unknown, Size: 70 will be dropped
0:00:03.084527020 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1310:gst_h264_parse_handle_frame: input stream is corrupt; it contains a NAL unit of length 0
0:00:03.084584268 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1349:gst_h264_parse_handle_frame: broken/invalid nal Type: 5 Slice IDR, Size: 15 will be dropped
0:00:03.084634725 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1349:gst_h264_parse_handle_frame: broken/invalid nal Type: 1 Slice, Size: 210 will be dropped
0:00:03.125247653 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1349:gst_h264_parse_handle_frame: broken/invalid nal Type: 6 SEI, Size: 88 will be dropped
0:00:03.125289558 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1349:gst_h264_parse_handle_frame: broken/invalid nal Type: 1 Slice, Size: 1023 will be dropped
0:00:03.166151911 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1349:gst_h264_parse_handle_frame: broken/invalid nal Type: 6 SEI, Size: 88 will be dropped
0:00:03.166192843 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1349:gst_h264_parse_handle_frame: broken/invalid nal Type: 1 Slice, Size: 179 will be dropped
0:00:03.207166903 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1349:gst_h264_parse_handle_frame: broken/invalid nal Type: 6 SEI, Size: 88 will be dropped
0:00:03.207205099 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1349:gst_h264_parse_handle_frame: broken/invalid nal Type: 1 Slice, Size: 231 will be dropped
0:00:03.248049068 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1349:gst_h264_parse_handle_frame: broken/invalid nal Type: 1 Slice, Size: 20 will be dropped
0:00:03.248106363 6538 0x55abd1a811e0 WARN h264parse gsth264parse.c:1310:gst_h264_parse_handle_frame: input stream is corrupt; it contains a NAL unit of length 0 (Final warning and then the pipeline was stopped…)

So as you see, I’ve changed nothing in the pipeline for the second RTP source, but it’s really strange that I could not get a poisitive response as the first one. I’ve also changed the decoded width or height but it didn’t work, so I doubt that maybe there’s sth different in the 2 RTP sources. The details of the 2 RTP source are as follows:

For the first source (which is rightly decoded and shown):

Input #0, sdp, from ‘239.33.36.13_37004.sdp’: Duration: N/A, start: 0.700000, bitrate: N/A
Stream #0:0: Video: h264 (High), yuv420p(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 15 fps, 30 tbr, 90k tbn, 30 tbc
[STREAM]
index=0
codec_name=h264
codec_long_name=H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
profile=High
codec_type=video
codec_time_base=1/30
codec_tag_string=[0][0][0][0]
codec_tag=0x0000
width=1920
height=1080
coded_width=1920
coded_height=1080
has_b_frames=0
sample_aspect_ratio=1:1
display_aspect_ratio=16:9
pix_fmt=yuv420p
level=40
color_range=tv
color_space=unknown
color_transfer=unknown
color_primaries=unknown
chroma_location=left
field_order=progressive
timecode=N/A
refs=1
is_avc=false
nal_length_size=0
id=N/A
r_frame_rate=30/1
avg_frame_rate=15/1
time_base=1/90000
start_pts=63000
start_time=0.700000
duration_ts=N/A
duration=N/A
bit_rate=N/A
max_bit_rate=N/A
bits_per_raw_sample=8
nb_frames=N/A
nb_read_frames=N/A
nb_read_packets=N/A
DISPOSITION:default=0
DISPOSITION:dub=0
DISPOSITION:original=0
DISPOSITION:comment=0
DISPOSITION:lyrics=0
DISPOSITION:karaoke=0
DISPOSITION:forced=0
DISPOSITION:hearing_impaired=0
DISPOSITION:visual_impaired=0
DISPOSITION:clean_effects=0
DISPOSITION:attached_pic=0
DISPOSITION:timed_thumbnails=0
[/STREAM]
[FORMAT]
filename=239.3.36.13_37004.sdp
nb_streams=1
nb_programs=0
format_name=sdp
format_long_name=SDP
start_time=0.700000
duration=N/A
size=125
bit_rate=N/A
probe_score=50
[/FORMAT]

For the seconde RTP source (which is taken from a camera Bosch and could not be decoded in DeepStream):

Input #0, sdp, from ‘239.38.164.18_60160.sdp’: Duration: N/A, start: 0.160000, bitrate: N/A
Stream #0:0: Video: h264 (Main), yuv420p(progressive), 512x288 [SAR 1:1 DAR 16:9], 25 fps, 25 tbr, 90k tbn, 180k tbc
[STREAM]
index=0
codec_name=h264
codec_long_name=H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
profile=Main
codec_type=video
codec_time_base=1/50
codec_tag_string=[0][0][0][0]
codec_tag=0x0000
width=512
height=288
coded_width=512
coded_height=288
has_b_frames=1
sample_aspect_ratio=1:1
display_aspect_ratio=16:9
pix_fmt=yuv420p
level=30
color_range=unknown
color_space=unknown
color_transfer=unknown
color_primaries=unknown
chroma_location=left
field_order=progressive
timecode=N/A
refs=1
is_avc=false
nal_length_size=0
id=N/A
r_frame_rate=25/1
avg_frame_rate=25/1
time_base=1/90000
start_pts=14400
start_time=0.160000
duration_ts=N/A
duration=N/A
bit_rate=N/A
max_bit_rate=N/A
bits_per_raw_sample=8
nb_frames=N/A
nb_read_frames=N/A
nb_read_packets=N/A
DISPOSITION:default=0
DISPOSITION:dub=0
DISPOSITION:original=0
DISPOSITION:comment=0
DISPOSITION:lyrics=0
DISPOSITION:karaoke=0
DISPOSITION:forced=0
DISPOSITION:hearing_impaired=0
DISPOSITION:visual_impaired=0
DISPOSITION:clean_effects=0
DISPOSITION:attached_pic=0
DISPOSITION:timed_thumbnails=0
[/STREAM]
[FORMAT]
filename=239.38.164.18_60160.sdp
nb_streams=1
nb_programs=0
format_name=sdp
format_long_name=SDP
start_time=0.160000
duration=N/A
size=125
bit_rate=N/A
probe_score=50
[/FORMAT]

As you can see, there’s some difference between the parameters of these 2 source. Actually I don’t kown very well the format of the video, so could someone tell me if this decoding problme is caused by some strange parameters in the second source, or should I modify my pipeline to adapt it? Thanks in advance!

This is a RTP session but not deepstream problem. You can try with basic gstreamer pipeline instead of using deepstream.

I’m curious for your camera, seems it uses RTP and the transport protocol, there should be RTCP to initialize and maintain the transport protocol session, the correct payload type and format information can be available with RTCP session.

Hello Fiona, thanks for your response. Actually when using gstreamer pipeline the seconde RTP source can be successfully decoded and displaed, while in DeepStream not. I just wanted use this camera source to do sth interesting in Deepstream like vehicle tracking, but now I am stopped at the decoding step. I think I’ve got all the right format information of this source, nut it seems that Deepstream could not decode it.

Can you try with removing H264parse before nvv4l2decoder for the second camera?

Thanks for your advice Fiona. This time I’ve tried the pipeline without h264parser for the second camera:

GST_DEBUG=3 gst-launch-1.0 -e udpsrc address=239.38.164.18 port=60160 caps = "application/x-rtp, media=(string)video,payload-type=(int)96, clock-rate=(int)90000, encoding-name=(string)H264" ! rtpjitterbuffer mode=4 ! rtph264depay ! nvv4l2decoder drop-frame-interval=1 ! nvvideoconvert ! video/x-raw, width=1080,height=720,format=NV12 ! nveglglessink window-x=0 window-y=0 window-width=1080 window-height=720

While it was always blocked in “New clock: GstSystemClock” …

Actually when I removed the element h264parser, the pipeline doesn’t work for both the 2 camera RTP sources. It seems that the h264 parser has dropped all the data when I was trying to decode the second source, while for the first source, h264parser shows me the same warnings but anyway it works well and has successfully passed the data to the nvv4l2decoder.

H264parse is open source gstreamer plugin. The frames are dropped by h264parse, so there should be some problem with the payload from the second camera.

Can you show us the gstreamer pipeline which can successfully run with your second camera?

gst-launch-1.0 -e udpsrc address=239.38.164.18 port=60160 caps ="application/x-rtp, me-dia=(string)video,payload-type=(int)96, clock-rate=(int)90000, encoding-name=(string)H264" ! queue max-size-buffers=20000 ! rtpjitterbuffer mode=0 ! rtph264depay ! avdec_h264 ! videoconvert ! xvimagesink

This gstreamer pipeline cans decode the second source and show the video on display.

Hi Fiona, any obervation?

First, hardware decoder is not that flexible to handle error data than software decoder. UDP protocol is not reliable, video packages may be lost during the UDP transferring. From your log, seems there is no SPS package received, so the h264parse dropped all following packages because the stream must start from SPS and PPS nals. And there are some SEI nal received, so I think your stream may have very long period of start point(SPS and PPS interval). I don’t think your camera is a pure UDP camera(there is SDP information in your camera configuration), please use the correct protocol to setup the transmission session to reduce the package lost rate. And you can also consult your camera vendor how to setup the camera codec settings to reduce the SPS and PPS interval together with IDR.