TX2 H264 RTSP Stream decoding issues

My team is utilizing the Jetson TX2 hardware for a computer vision project. We are running a gstreamer pipeline setup similar to the Jetson inference demo.

The issue we face is that we are using an h264 rtsp stream. Something about the h264 encoding gives the Jetson omxh264dec hardware decoder some trouble, and after some time the stream gets delayed. We only have the option of h264 or mjpeg, and both have had the same issues with TX2 hardware decoders.

Does anyone have experience with RTSP pipelines on the Jetson? Do you have any idea why prolonged usage of the decoder would deteriorate the output/cause delay? After 20-30 minutes, performance of the decoder is degraded and we either get blocky output or a massive delay (or both). Eventually it becomes totally unusable.

Here is a link to a saved raw rtsp stream clip that produces the same decoder issues as the rtsp stream. If you run it through filesrc with gstreamer, it will give some SEI type 5 errors.

https://drive.google.com/file/d/0ByFwk3VGcEUlZWItSk1hSzFZUm8/view?usp=sharing

For reference our gstreamer pipeline is similar to this:

This software decoder pipeline works at 15fps 1080p over long periods of time (tested 10 hrs with no delay) but fails at 30fps

std::string pipelinestr = “rtspsrc location=” + RTSP_URL + " ! queue ! rtph264depay ! queue ! h264parse ! queue ! avdec_h264 ! queue ! videoconvert ! video/x-raw, format=RGB ! queue ! appsink name=mysink sync=false";

This hardware decoder pipeline is what we would image should work at 30 fps 1080p h264 over long periods of time. It starts out running fine, but eventually will begin to develop a delay after 10-20minutes, you can see the gstreamer memory usage increase during this process.

std::string pipelinestr = “rtspsrc location=” + RTSP_URL + " ! queue ! rtph264depay ! queue ! h264parse ! queue ! omxh264dec ! nvvidconv ! video/x-raw, format=NV12 ! queue ! appsink name=mysink sync=false";

The goal is to have an rtsp feed being decoded in real-time and operate for long periods of time without lagging behind or the output becoming unusable.

For reference, a similar pipeline but with omxh264dec replaced with avdec_h264 on GTX 1080 Ubuntu desktop produces no issues, even when run 24 hours straight, whereas the performance of the Jetson degrades quickly.

Any input you have would be appreciated!

Hi nicholas.sieb,
You also can run avdec_h264 on TX2. Can you try it?

HW decoder may not have same error handling as SW decoder. When the bitstream is corrupted, it probably fails to decode.

Thanks,

I’ll be trying this, but of course there are some concerns since just with a test pipeline using avdec_h264, gstreamer gives messages about decoding being too slow “There may be a timestamping problem, or this computer is too slow.”

If there is a memory leak or error handling issue in HW decoder, any tips in reporting this issue?

Any tips in mitigating slowness issues or dealing with live RTSP sources on the Jetson?

Pipeline needs to remain live and not delayed, if for some reason h264 depay/parse or decode cannot keep up, how can this be handled to preserve stream and keep the feed live? Emphasis on long-term operation since omxh264dec functions for 20-30 minutes and then begins deteriorating.

Hi nicholas sieb,
Can you try the case to generate the source with ‘omxh264enc insert-sps-pps=true’?

The source is IP camera RTSP feed, so there is no gstreamer pipeline generating the source. Also gave a quick test to avdec_h264 and while it fared better in long-term usage (no delay or corruption), the decoding FPS was extremely slow 1-2 FPS.

Any other ideas to solve the problem while still using omxh264dec? Or ideas to increase speed of avdec_h264?

Do you think it could be this specific h264 encoding that causes issues with the hardware decoder? I have given a sample in original post of the h264 feed.

Hi nicholas, we will check playback of the stream in #1.

avdec_h264 is a SW decoder. You can get max performance by executing ‘sudo ./jetson_clocks.sh’

Hi nicholas.sieb,

We try below command can’t play video:

gst-launch-1.0  filesrc location= videotest.mp4 ! queue max-size-bytes=42000 max-size-buffers=0 max-size-time=0 leaky=downstream ! qtdemux ! h264parse ! omxh264dec ! nvoverlaysink

But works with below command:

gst-launch-1.0  filesrc location=videotest.mp4 ! qtdemux ! h264parse ! omxh264dec ! nvoverlaysink

Please check ‘queue’ setting, it should be works!

Hi I did some more testing, and what seemed to help was more queueing of the workload, which better threads it.

A pipeline such as this: std::string pipelinestr = “rtspsrc location=rtsp://43.148.80.81 ! queue ! rtph264depay ! queue ! h264parse ! queue ! avdec_h264 ! queue ! videoconvert ! video/x-raw, format=RGB ! queue ! appsink name=mysink sync=false”;

Is able to handle a 15fps 1080p source without ever getting behind, however with a 30fps 1080p source it falls behind quickly. For my immediate purpose that’s okay and an improvement. I would like to be able to slot in the hardware decoder and support 30fps 1080p, but it still presents issues with this source such as falling behind the live stream.

The reason I gave the file is because it seems to throw errors during hardware decoding, if you enable some gstreamer logging.

Hi nicholas,
If you have bitstream which cannot be decoded well by omxh264dec, please share it.

If it is about configuration of’ queue’ element, we don’t have much experience about it and please other users can share experience.

The included example is a bitstream that cannot be decoded well. It can decode (while outputting some errors) for a short time, but after running such a bitstream for long lengths of time, 30+ minutes, the decoder begins to fail.

Hi nocholas, by removing ‘leaky=downstream’, the video playback runs fine:

$ gst-launch-1.0 filesrc location= videotest.mp4 ! queue max-size-bytes=42000 max-size-buffers=0 max-size-time=0 ! qtdemux ! h264parse ! omxh264dec ! nvoverlaysink

Please remove it and try again.

You won’t be able to repro this with a filesrc such as above.

It needs to be an RTSP stream, that is at least 1080p 30fps and h264. We have set up a raspberry pi w/ a camera to act as an RTSP feed w/ 1080p 30fps h264, and the Jetson did not have the same issues as with our other IP camera. The video sample above is from that IP camera feed, and we think the specific encoding is what gives us issues.

This software decoder pipeline works at 15fps 1080p over long periods of time (tested 10 hrs with no delay) but fails at 30fps

std::string pipelinestr = “rtspsrc location=” + RTSP_URL + " ! queue ! rtph264depay ! queue ! h264parse ! queue ! avdec_h264 ! queue ! videoconvert ! video/x-raw, format=RGB ! queue ! appsink name=mysink sync=false";

This hardware decoder pipeline is what we would image should work at 30 fps 1080p h264 over long periods of time. It starts out running fine, but eventually will begin to develop a delay after 10-20minutes, you can see the gstreamer memory usage increase during this process.

std::string pipelinestr = “rtspsrc location=” + RTSP_URL + " ! queue ! rtph264depay ! queue ! h264parse ! queue ! omxh264dec ! nvvidconv ! video/x-raw, format=NV12 ! queue ! appsink name=mysink sync=false";

For reference, other users have already posted this issue on the forum and have not received a solution: https://devtalk.nvidia.com/default/topic/1023655/jetson-tx2/gstreamer-pipeline-video-decoding-delay-grows-over-time-and-memory-consumed-also-grows/

For clarification, this is about nvvideosink and the user does not clearly share steps/information about it.

You don’t use nvvideosink in your case. Please not mess up things.

Hi nicholas,
Your comment implies there is possible memory leakage in rtsp streaming, so we ran a 16-hour test. The result looks good. We do not observe significant memory increase, and playback looks smooth.

[Server] r28.1/TX2

$ ./test-launch "nvcamerasrc ! video/x-raw(memory:NVMM),width=1920,height=1080,format=NV12,frametate=30/1 ! omxh264enc bitrate=10000000 ! rtph264pay name=pay0 pt=96"

[Client] r28.1/TX2

$ gst-launch-1.0 rtspsrc location="rtsp://10.19.106.172:8554/test" ! rtph264depay ! h264parse ! omxh264dec ! nvoverlaysink

Memory at initialization

asks: 301 total,   1 running, 300 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3.9 us,  2.5 sy,  0.0 ni, 93.3 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
KiB Mem :  8039124 total,  6536540 free,   928592 used,   573992 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  7016176 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1982 nvidia    20   0 1282100  32872  23320 S  21.4  0.4   0:33.92 gst-launch+



nvidia@tegra-ubuntu:~$ cat /proc/meminfo
MemTotal:        8039124 kB
MemFree:         6536804 kB
MemAvailable:    7016460 kB
Buffers:           28520 kB
Cached:           464216 kB
SwapCached:            0 kB
Active:           632152 kB
Inactive:         276572 kB
Active(anon):     417408 kB
Inactive(anon):    13300 kB
Active(file):     214744 kB
Inactive(file):   263272 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        416000 kB
Mapped:           157920 kB
Shmem:             14724 kB
Slab:              81260 kB
SReclaimable:      44144 kB
SUnreclaim:        37116 kB
KernelStack:        8128 kB
PageTables:         8716 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4019560 kB
Committed_AS:    3157288 kB
VmallocTotal:   258998208 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
CmaTotal:          65536 kB
CmaFree:           63480 kB


root@tegra-ubuntu:/sys/kernel/debug/nvmap/iovmm# cat clients
CLIENT                        PROCESS      PID        SIZE
user                   gst-launch-1.0     1982     118652K
user                           compiz     1233      62016K
user                  nvcamera-daemon      856          0K
user                     argus_daemon      855          0K
user                             Xorg      777     164160K
total                                              344828K

Memory after 16 hours

top - 01:01:48 up 16:28,  3 users,  load average: 1.56, 1.44, 1.34
Tasks: 297 total,   1 running, 296 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.8 us,  2.2 sy,  0.0 ni, 91.9 id,  0.0 wa,  0.0 hi,  1.1 si,  0.0 st
KiB Mem :  8039124 total,  6428704 free,  1034308 used,   576112 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  6910088 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1982 nvidia    20   0 1282100  95704  23348 S  23.0  1.2 211:58.93 gst-launch+



nvidia@tegra-ubuntu:~$ cat /proc/meminfo
MemTotal:        8039124 kB
MemFree:         6428540 kB
MemAvailable:    6909928 kB
Buffers:           29608 kB
Cached:           464652 kB
SwapCached:            0 kB
Active:           700232 kB
Inactive:         275872 kB
Active(anon):     483268 kB
Inactive(anon):    13300 kB
Active(file):     216964 kB
Inactive(file):   262572 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        481872 kB
Mapped:           158728 kB
Shmem:             14728 kB
Slab:              82004 kB
SReclaimable:      44356 kB
SUnreclaim:        37648 kB
KernelStack:        8096 kB
PageTables:         8904 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4019560 kB
Committed_AS:    3215236 kB
VmallocTotal:   258998208 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
CmaTotal:          65536 kB
CmaFree:           63480 kB
nvidia@tegra-ubuntu:~$



root@tegra-ubuntu:/sys/kernel/debug/nvmap/iovmm# cat clients
CLIENT                        PROCESS      PID        SIZE
user                   gst-launch-1.0     1982     118652K
user                           compiz     1233      66112K
user                  nvcamera-daemon      856          0K
user                     argus_daemon      855          0K
user                             Xorg      777     164160K
total                                              348924K

Do you see significant memory increase in your case?

Hi Dane and Nicholas,

Having similar issues here to what Nicholas is describing with some RTSP IP cameras.

$ gst-launch-1.0 rtspsrc location="rtsp://my_ip_here/test" ! rtph264depay ! h264parse ! omxh264dec ! nvoverlaysink

Just tried the command above as suggested and withing 15 minutes TX2 CPU load went from 30% to 110% (as reported by top command). The latency increased from 2 seconds at the begging to about 2 minutes at the end of the 15 minute test. Also getting those messages when the CPU load was over 90%

Additional debug info:
gstbasesink.c(2854): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstNvOverlaySink-nvoverlaysink:nvoverlaysink-nvoverlaysink0:
There may be a timestamping problem, or this computer is too slow.
WARNING: from element /GstPipeline:pipeline0/GstNvOverlaySink-nvoverlaysink:nvoverlaysink-nvoverlaysink0: A lot of buffers are being dropped.
Additional debug info:
gstbasesink.c(2854): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstNvOverlaySink-nvoverlaysink:nvoverlaysink-nvoverlaysink0:
There may be a timestamping problem, or this computer is too slow.
WARNING: from element /GstPipeline:pipeline0/GstNvOverlaySink-nvoverlaysink:nvoverlaysink-nvoverlaysink0: A lot of buffers are being dropped.
Additional debug info:
gstbasesink.c(2854): gst_base_sink_is_too_late (): /GstPipeline:pipeline0/GstNvOverlaySink-nvoverlaysink:nvoverlaysink-nvoverlaysink0:
There may be a timestamping problem, or this computer is too slow.
TVMR: FrameRate = 30.000840

Built gstreamer and plug-ins from source; having the same issue and debugging now. Any ideas how to debug the increasing CPU load and latency issues are appreciated.

Also is there a way to use the “Multimedia API -> v4l2 stack -> HW driver” path with RTSP streams?

Hi nikosR,
Please compare memory usage when the issue happens.
‘top’
‘cat /proc/meminfo’
‘root@tegra-ubuntu:/sys/kernel/debug/nvmap/iovmm# cat clients’

Hi DaneLLL,

Please find below the information requested.

Command was

gst-launch-1.0 rtspsrc location="rtsp://my_ip_here/test" ! rtph264depay ! h264parse ! omxh264dec ! nvoverlaysink

Start

12288 nvidia    20   0 1578620  26296  17064 S  12.0  0.3   0:04.60 gst-launch-1.0

CLIENT                        PROCESS      PID        SIZE
user                   gst-launch-1.0    12288      71060K
...
total                                              724676K

MemTotal:        8039124 kB
MemFree:         1450276 kB
MemAvailable:    3373984 kB
Buffers:           95604 kB
Cached:          1903628 kB
SwapCached:            0 kB
Active:          3518824 kB
Inactive:        1033472 kB
Active(anon):    2554760 kB
Inactive(anon):   131528 kB
Active(file):     964064 kB
Inactive(file):   901944 kB
Unevictable:          32 kB
Mlocked:              32 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               248 kB
Writeback:             0 kB
AnonPages:       2553136 kB
Mapped:           542392 kB
Shmem:            133228 kB
Slab:             153688 kB
SReclaimable:     100204 kB
SUnreclaim:        53484 kB
KernelStack:       14208 kB
PageTables:        32008 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4019560 kB
Committed_AS:    9627112 kB
VmallocTotal:   258998208 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
CmaTotal:          65536 kB
CmaFree:           63480 kB

After 5 minutes CPU is over 80% as reported by top

12288 nvidia    20   0 1578620  45300  17064 S  83.8  0.6   5:06.18 gst-launch-1.0

CLIENT                        PROCESS      PID        SIZE
user                   gst-launch-1.0    12288      71060K
...
total                                              724676K

MemTotal:        8039124 kB
MemFree:         1380580 kB
MemAvailable:    3305432 kB
Buffers:           96672 kB
Cached:          1903628 kB
SwapCached:            0 kB
Active:          3585420 kB
Inactive:        1033444 kB
Active(anon):    2620260 kB
Inactive(anon):   131500 kB
Active(file):     965160 kB
Inactive(file):   901944 kB
Unevictable:          32 kB
Mlocked:              32 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               276 kB
Writeback:             0 kB
AnonPages:       2618876 kB
Mapped:           542440 kB
Shmem:            133200 kB
Slab:             154060 kB
SReclaimable:     100252 kB
SUnreclaim:        53808 kB
KernelStack:       14176 kB
PageTables:        32224 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     4019560 kB
Committed_AS:    9677152 kB
VmallocTotal:   258998208 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
CmaTotal:          65536 kB
CmaFree:           63480 kB

Hi DaneLLL,

Please note the information provided above.

Could you please also let us know if it is possible to use the “Multimedia API -> v4l2 stack -> HW driver” path with RTSP streams?

Thank you!

It is possible. You can run rtspsrc ! rtph264depay ! h264parse ! appsink, and integrate appsink with Multimedia API to decode the h264 stream.

The memory usage looks stable and seems not the issue. Can we reproduce the issue with filesrc? If it is something wrong in omxh264dec, we should also reproduce the issue by replacing rtspsrc ! rtph264depay with filesrc.

Hi DaneLLL,

Not possible to repro with filesrc.

Captured a few minutes of the RTSP stream and then tried to play but getting many issues with hardware and software decoders. Neither of them was able to play the captured packets.

Also noticed that the mediainfo reports variable frame rate for this IP camera (and there is no way to set to fixed). Could that be causing issues to the gstreamer / omxh264dec pipeline?

We cannot repro issues with another camera that mediainfo reports fixed frame rate.

We cannot repro issues when the software decoder is used. Just checked again with software decoder and CPU load is between 60% and 80% (from top) with no lag/latency and no increasing CPU load.

The issue with hardware decoder is mainly CPU load constantly increasing and eventually “freezing” the pipeline.