omxh264dec/nvvidconv: 34.7MB per minute memory leak/crash when decoding Interlaced H264 video

Decoding H264 1920x1080i50 (Interlace video) makes the decoding engine to slowdown dramatically over time, tegrastats reports an increase of memory of around 34.7MB per minute decoded.

Intial load: from 1412/3957MB to 1436MB, decoding around 515=6 frames per second (executable load 24MB, 220 frames decoded)

DATA
Min running Tegrastats Men reported 1412/3957MB Leak per minute Accumulated Leak Fps performance Frames decoded
Start 1436 0 0 220
1 1472 36 36 516 3105
2 1511 39 75 515 5946
3 1549 38 113 473 9046
4 1590 41 154 416 12156
5 1621 31 185 369 14912
6 1658 37 222 320 18012
7 1690 32 254 282 20946
8 1725 35 289 248 23930
9 1763 38 327 219 27008
10 1800 37 364 195 29958
11 1831 31 395 176 32916
12 1865 34 429 159 35920
13 1903 38 467 144 38957
14 1931 28 495 132 41897
15 1957 26 521 121 44879

gst-launch-1.0 -e filesrc location=1080i50.ts ! tsdemux ! \
h264parse disable-passthrough=true ! \
omxh264dec ! 'video/x-raw(memory:NVMM)' ! \
fakesink
gst-launch-1.0 -e filesrc location=1080i50.ts ! tsdemux ! \
h264parse disable-passthrough=true ! \
nvv4l2decoder ! 'video/x-raw(memory:NVMM)' ! \
fakesink

The memory leak, has something to do with the fame size allocated, When decoding 720x576i50 the leak is about 9.2MB per minute of video decoded Performance also degraded dramatically.

omxh264dec 720x576i50 H264
Min running Tegrastats Men reported 1418/3957MB Leak per minute Accumulated Leak Fps performance Frames decoded
Start 1443 0 2166 2173
1 1450 7 7 1753 3836
2 1459 9 16 1372 6906
3 1468 9 25 921 10667
4 1477 9 34 668 14141
5 1488 11 45 496 17808
6 1498 10 55 389 21340

nvv4l2decoder 720x576i50 H264
Min running Tegrastats Men reported 1418/3957MB Leak per minute Accumulated Leak Fps performance Frames decoded
Start 1440 0 0 926
1 1442 2 2 1753 3509
2 1443 1 3 1473 7393
3 1445 2 5 1124 10712
4 1446 1 6 837 14299
5 1448 2 8 643 17790
6 1449 1 9 510 21335

Notes:

  • checked every component of the pipeline (no leaks on filesrc nor h264parse
  • nvv4l2decoder sometimes crashes after 2.2min of HD decoding (Used different content) and 6:50min on SD PAL, Leak on nvvidconv is smaller than omxh264dec, BUT performance degradation is similar (Went down from 480fps to 126fps on by 00:20:00 of the stream and to 64fps after 00:40:00)
  • Used a variety of 1080i50 professional content encoded with Harmonics/Elemental/enViVo and even x264
    ** This problem is not happening on MPEG2, and for completeness the Jetson Nano decoded 239fps MPEG2 1920x1080 frames
    ** This problem is not happening on H264 Progressive video, and for completeness the Jetson Nano decoded 284fps H264 1920x1080 frames

Memory Leak (CSV).txt (580 Bytes)

Hi,
Please upload an 1080i50 video file for reproducing the issue.

Username: RNXVPGDHPR
Password: SD1ga^fd

https://ftpservices.nvidia.com 

The login above will expire on 7/9/2019 12:00:00 AM

Uploading 2 HD and one SD 16 min each, let me know if you need it larger than that.

Uploaded.

Hi,
We are deprecating omx plugins, so please use nvv4l2decoder. In running below pipeline, we can see RES increasing slightly:

$ gst-launch-1.0 filesrc location= H264_15min_25.ts ! tsdemux ! h264parse ! nvv4l2decoder ! fakesink sync=true
PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
7897 nvidia    20   0  426256  35636  19720 S  31.2  0.9   0:00.95 gst-launch-1.0
7897 nvidia    20   0  426256  61672  19720 S 114.8  1.5   5:16.66 gst-launch-1.0

It is not seen in decoding progressive stream. We will check this.

I read about the deprecation of omx Plug ins. And as you can see on the report, I also did also the test on nvv4l2decoder, And yes the leak is lower (20X lower), but not something it can be used for a long time. Apart from the leak, the main problem on both elements, is the slowing down of decoding performance, Check the fps performance I posted, you will be able to see the degradation to levels where HW acceleration defeats the purpose to the point where is even more convenient to use an H264 SW decoder using the 4 CPUs.

BTW: I used tegrastats to build up the report table. Also the critical degradation happens around 15 minutes of decoding on HD

@Dane can you share new documentation?

gst_str_1 = ("rtspsrc location=rtsp://192.168.10.78/media/live/2/1  latency=0 ! rtph265depay ! h265parse ! omxh265dec ! nvvidconv ! video/x-raw , format=(string)BGRx ! videoconvert ! appsink")

cap = cv2.VideoCapture(gst_str_1, cv2.CAP_GSTREAMER)

Please see if this correct

gst_str_1 = ("rtspsrc location=rtsp://192.168.10.78/media/live/2/1  latency=0 ! rtph265depay ! h265parse ! <b>nvv4l2decoder</b> ! nvvidconv ! video/x-raw , format=(string)BGRx ! videoconvert ! appsink")

cap = cv2.VideoCapture(gst_str_1, cv2.CAP_GSTREAMER)

This is to convert RTSP video frames into images and send queries.

Hi Rudoplh,
Please execute ‘sudo jetson_clcoks’,enable following property, and try again:

enable-max-performance: Set to enable max performance
                        flags: readable, writable
                        Boolean. Default: false

Please also check thermal in tegrastats.

Hi RaviKiranK,
Your comment is different from this topic, although the streamer pipeline looks good. You may make a new post, if you still hit issues in running gstreamer + OpenCV

nvv4l2decoder with enable-max-performance enabled results:

Problem remains

The decoding degraded to 1X (lower than realtime) after decoding 155,230 1080i50 fields (00:52:21.34)

Memory: At boot 367MB, Decoding start: 463MB, ended at 615MB after the 155,230 fields decoded. (152MB leak)

The Jetson Nano was connected to a 5V 4A power supply using the Barrel, jetson_clcoks enabled full power.

Thermal 26C while decoding, 23.5C before the test.

Remarks: One of the CPUs remained to 100% while the others ran at 1%~3%

Adding up to the record:

I ran the same test using the same video content (But encoded as progressive this time) and the system doesn’t show this problem.

Memory footprint 363MB before test. 437MB while running and grow to 459MB after 269,955 Progressive Frames decoded (3 hours of video decoded) and relaxed to 435MB after the program ended execution. (8MB leak per hour of 1080p video decoded.

Decoder performance 257Fps and 280Fps with jetson_clocks

Temperature GPU/CPU 27C

CPUs 13%,7%,9%,7%

This issue may be related to the one I wrote about a couple of weeks ago here: [url]https://devtalk.nvidia.com/default/topic/1055089/jetson-nano/extremely-unreliable-hw-encoder/post/5350167/[/url]

I couldn’t identify the source of the problem, however I could confirm that it exists on progressive inputs as well.

Not to bear any hard feelings, but I find it ironic that on my topic I was told to use omx and here OP was told otherwise…

On a side note, there was recently an update to gstreamer 1.14.4 (previous was 1.14.1) however nvv4l2 codecs all report version 1.14.0 so I doubt something changed there…

Hi 11wallace11,
Your issue is investigated and shall be fixed in next r32.2. If you are going to production based on r32.1 and cannot wait for r32.2, please let us know more about your project.

For v4l2 plugins, we download
https://gstreamer.freedesktop.org/src/gst-plugins-good/gst-plugins-good-1.14.0.tar.xz
and modify to support hardware acceleration. The modification is open source at
https://developer.nvidia.com/embedded/dlc/l4t-sources-32-1-jetson-nano
gst-v4l2 of r32.2 stays on 1.14.0. If you have seen critical issues being fixed in new 1.14.2(or newer). Please kindly note us.

The issue I’m reporting is on the decoder side. The encoder (As a element) is working fine for me excepting for a bug on the b frames on H264 that I’m going to post in a few minutes. My testings are 3 days of content long and is stable (no memory leaks and reliable)

What you might have experienced is that Nvidia decided to use a low level media engine such as GSTreamer. GStreamr have a prototype tool called gst-launch, this is tool doesn’t handle many of the stuff that makes a commercial product reliable, such drop of connections and recovery, glitches on the stream, lack of standard complains by some vendors, buffer handling, etc. instead you need to build your own app using this high-level media abstraction. or like me, drop all this open source stuff and build your own SW using C++ and the NMAP components. (NMAP as a NVIDIA Multimedia Applications)

*full disclosure, I don’t work for Nvidia and I hope it helps…

Thank you both for taking your time and commenting about my issue in this thread.

@DaneLLL

Thanks for your kind offer and pointing me in the right direction to look for the source code.
I’m not in a hurry and can definitely wait for the next version to come out :)
However, if there’s a public beta or beta-testing program I’ll be more than happy to participate and issue detailed bug reports.

@Rudoplh

I’ve been involved in the “media codecs” scene for more than 10 years now, and while I probably have the basic knowledge necessary to build something that’d work, I feel that it’s not worth my time to start working on a new project for a platform that’s being dominated by Nvidia…
That being said, if Nvidia had a proper git repository somewhere and accepted participation of developers not from the company, it may be a good incentive for me (and probably other developers as well) to participate.
To sum up my stance since it’s probably still not so clear at point, I prefer an open repository that everyone worked on (or forked) instead of the current “open source” form of “here, take the source code and develop something for yourself, and yourself only”.

I’m terribly sorry for deviating so much from the original topic. Feel free to ignore my comment :)

Hi,
On r32.2, comment #5 is a known issue. FYI.

Hi Dane,

The problem on the decoding remains, In fact it went much worst than the status of r32.1.

r32.2 and r32.2.1 which are the same gst-nvvideo4linux2_src code, made the decoding of H264 on the Jetson Nano totally useless. It has being almost 4 months and the problem rather than get fixed went much worst and we have commitments to fulfill.

omxh264dec there is a memory leak, but at least we know that the Jetson is able to play H.264 interlace properly. The memory leak appears to be because the videodecoder Sub-class should implement drain()

The new nvv4l2decoder is a disaster. Rather than fix bugs, It looks like nVidia decided to add more features like a new buffer method and the support of DIVx, something that probably nobody uses since more than 10 years ago (Probably pirates)

The new problem is that the decoding of H264i it is always running like in sync=false, in other words it is playing super fast and without timing control, or the timing of the output is in fields?. In addition there is some variations of H264, that doesn’t play at all (I can share professional stream examples), in other samples it looks like the chroma planes are gone. This problems are not happening on omxh264dec. The caps reported by the decoder are not right they used to report progressive output but now is “mixed.” This can be fixed by putting back a pice of code that was moved. See the picture.

Moving forward: As developers, we are relying on nVidia as a product, as developers we cannot move out of the BETA stage and we need Nvidia to understand that this is not right and is costing us business.

I like @11wallece11 ide to put the gst-nvvideo4linux2_src repo online, maybe at nvidia · GitLab or NVIDIA Corporation · GitHub, so we can help with testing and development in a more agile way.

Adding some note to my last replay:

Part of the memory leak, the Cache part can be fixed by doing

sysctl vm.drop_caches=3

nvv4l2decoder: Have other broken parts, Per example the disableDB=true is producing problems too. The new bufferAPI feature, seems to have no effect into the pipe.

I will he happy to help by making a report and a full test of your software. BUT to be honest, I’m disturbed by how nVidia is releasing software without proper testing or quality control, or adding a note of the know issues and potential fixing schedule.

p.s. I can also put the repo into GitHub, as long nVidia is committed to put the daily changes on that repo to allow other developers help you.

Hi Rudoplh,
We have debugged further and tried similar set up with odroid board (which uses gst-v4l2 open source plug-ins) and find similar observations of increasing vmRSS during playback of long duration interlaced streams. And we are checking to file one bug to the gstreamer-plugins-good community as this seems to be a open source related issue.

For the issue about setting ‘disableDB=true’, if it is a separate issue from decoding interlaced streams, please start a new post for it. Thanks.

Hi Dane,

It cannot be a gstreamer issue.

  1. The memory leak of the omxh264dec is probably related to a warning that gstreamer is reporting s a FIX ME! because the videodecoder Sub-class should implement drain(). I know that someone at nVidia decided no abandon this to focus on nvv4l2decoder. In this case nVidia I believe it will be very helpful for ALL of us that nVidia release the repo into gitLab or GitHub, so we can track the changes made and continue with this effort, as it looks like is easier to fix that the new nvv4l2decoder

  2. nvv4l2decoder. Is working “more or less” ok on 32.1, it is seriously BROKEN on 32.2 and 32.1. This cannot be gstreamer fault.

PLEASE, I know that your group at nVidia is a very small, BUT this group cannot just dismiss this problems and leave your customers in the limbo. Each release posted to the public requires serious testing before the code is release it. It has being months, and we have made commitments and serious investments believing that nVidia released a PRODUCT.

PLEASE, PLEASE, check this issue, It is not gstreamer, the code r.32.2 and r32.2.1 have issues as I pointed, and this code is made by nVidia, no gStreamer.

BTW: The issue is not on long durations streams, in less than few seconds, you will see the memory leak issues.

Hi Rudoplh,
For interlaced-stream decoding, NVIDIA core teams are clarifying the issue with gstreamer community:

Will update once there is further new finding.
We are also reviewing test cases about nvv4l2decoder.