Nvv4l2decoder memory issue

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
GPU
• DeepStream Version
6.1.0

There seems to be a issue with endlessly growing memory when using nvv4l2decoder. When I run this gstreamer pipeline and check htop, I can see RES memory increasing linearly. When I have webm file and use vp9dec, I see no issues with the memory.

gst-launch-1.0 filesrc location=../samples/kuressaare.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! progressreport update-freq=1 ! nvv4l2h264enc bitrate=1000000 ! h264parse ! mp4mux ! filesink location=out.mp4

I also checked it using valgrind with the following command:

valgrind --tool=memcheck --leak-check=full --num-callers=100 --show-leak-kinds=definite,indirect --track-origins=yes --log-file="valgrind-out.txt" gst-launch-1.0 filesrc location=../samples/kuressaare.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! progressreport update-freq=1 ! nvv4l2h264enc bitrate=1000000 ! h264parse ! mp4mux ! filesink location=out.mp4

This is the result:

==308609== HEAP SUMMARY:
==308609==     in use at exit: 9,828,670 bytes in 12,276 blocks
==308609==   total heap usage: 122,717 allocs, 110,441 frees, 3,284,469,481 bytes allocated
==308609== 
==308609== 8 bytes in 1 blocks are definitely lost in loss record 132 of 4,009
==308609==    at 0x483B7F3: malloc (vg_replace_malloc.c:309)
==308609==    by 0x6E5032B: ???
==308609==    by 0x6E59E3A: ???
==308609==    by 0x6E4F95C: ???
==308609==    by 0x681343A: ???
==308609==    by 0x682108D: v4l2_ioctl (in /opt/nvidia/deepstream/deepstream-6.1/lib/libnvv4l2.so)
==308609==    by 0x67C9228: ??? (in /opt/nvidia/deepstream/deepstream-6.1/lib/gst-plugins/libgstnvvideo4linux2.so)
==308609==    by 0x67CD41E: gst_v4l2_buffer_pool_process (in /opt/nvidia/deepstream/deepstream-6.1/lib/gst-plugins/libgstnvvideo4linux2.so)
==308609==    by 0x67E22AF: ??? (in /opt/nvidia/deepstream/deepstream-6.1/lib/gst-plugins/libgstnvvideo4linux2.so)
==308609==    by 0x6598B4F: ??? (in /usr/lib/x86_64-linux-gnu/libgstvideo-1.0.so.0.1602.0)
==308609==    by 0x48E1EFE: ??? (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1602.0)
==308609==    by 0x48E3F60: ??? (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1602.0)
==308609==    by 0x48EAD72: gst_pad_push (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1602.0)
==308609==    by 0x646943F: ??? (in /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0.1602.0)
==308609==    by 0x48E1EFE: ??? (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1602.0)
==308609==    by 0x48E3F60: ??? (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1602.0)
==308609==    by 0x48EAD72: gst_pad_push (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1602.0)
==308609==    by 0x658A8A4: ??? (in /usr/lib/x86_64-linux-gnu/libgstvideo-1.0.so.0.1602.0)
==308609==    by 0x6591FFA: gst_video_decoder_finish_frame (in /usr/lib/x86_64-linux-gnu/libgstvideo-1.0.so.0.1602.0)
==308609==    by 0x67E0D2F: ??? (in /opt/nvidia/deepstream/deepstream-6.1/lib/gst-plugins/libgstnvvideo4linux2.so)
==308609==    by 0x4919106: ??? (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1602.0)
==308609==    by 0x4A8A373: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.6)
==308609==    by 0x4A89AD0: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.6)
==308609==    by 0x4B3F608: start_thread (pthread_create.c:477)
==308609==    by 0x4C7B132: clone (clone.S:95)
==308609== 
==308609== 16,384 bytes in 1 blocks are definitely lost in loss record 3,951 of 4,009
==308609==    at 0x483B7F3: malloc (vg_replace_malloc.c:309)
==308609==    by 0x4A65E98: g_malloc (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.6)
==308609==    by 0x4A707D3: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.6)
==308609==    by 0x4011B99: call_init.part.0 (dl-init.c:72)
==308609==    by 0x4011CA0: call_init (dl-init.c:30)
==308609==    by 0x4011CA0: _dl_init (dl-init.c:119)
==308609==    by 0x4001139: ??? (in /usr/lib/x86_64-linux-gnu/ld-2.31.so)
==308609==    by 0x15: ???
==308609==    by 0x1FFF000152: ???
==308609==    by 0x1FFF000161: ???
==308609==    by 0x1FFF000169: ???
==308609==    by 0x1FFF00018C: ???
==308609==    by 0x1FFF00018E: ???
==308609==    by 0x1FFF000196: ???
==308609==    by 0x1FFF000198: ???
==308609==    by 0x1FFF0001A2: ???
==308609==    by 0x1FFF0001A4: ???
==308609==    by 0x1FFF0001B2: ???
==308609==    by 0x1FFF0001B4: ???
==308609==    by 0x1FFF0001C3: ???
==308609==    by 0x1FFF0001D1: ???
==308609==    by 0x1FFF0001D3: ???
==308609==    by 0x1FFF0001E1: ???
==308609==    by 0x1FFF0001F1: ???
==308609==    by 0x1FFF0001F3: ???
==308609==    by 0x1FFF0001FD: ???
==308609==    by 0x1FFF0001FF: ???
==308609==    by 0x1FFF000206: ???
==308609==    by 0x1FFF000208: ???
==308609==    by 0x1FFF000211: ???
==308609== 
==308609== LEAK SUMMARY:
==308609==    definitely lost: 16,392 bytes in 2 blocks
==308609==    indirectly lost: 0 bytes in 0 blocks
==308609==      possibly lost: 34,244 bytes in 238 blocks
==308609==    still reachable: 9,762,346 bytes in 11,919 blocks
==308609==                       of which reachable via heuristic:
==308609==                         length64           : 1,104 bytes in 21 blocks
==308609==                         newarray           : 1,664 bytes in 24 blocks
==308609==         suppressed: 0 bytes in 0 blocks
==308609== Reachable blocks (those to which a pointer was found) are not shown.
==308609== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==308609== 
==308609== For lists of detected and suppressed errors, rerun with: -s
==308609== ERROR SUMMARY: 240 errors from 240 contexts (suppressed: 0 from 0)

I guess the issue is with “still reachable” memory.
The “still reachable” category within Valgrind’s leak report refers to “memory was allocated and was not subsequently freed before the program terminated.”
I have a lot of live stream cameras running in deepstream and this is how the container_memory_rss graph looks. :|

Can you check and confirm this issue?

A moderator asked me to make a new topic. This is the old one:

With valgrind leak kind all it reports.

==2087== 1,117,648 bytes in 1 blocks are still reachable in loss record 4,007 of 4,007
==2087==    at 0x483B7F3: malloc (vg_replace_malloc.c:309)
==2087==    by 0x70DE089: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.515.48.07)
==2087==    by 0x70D62EA: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.515.48.07)
==2087==    by 0x7081232: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.515.48.07)
==2087==    by 0x7119072: ??? (in /usr/lib/x86_64-linux-gnu/libcuda.so.515.48.07)
==2087==    by 0x695A481: ??? (in /usr/local/cuda-11.6/targets/x86_64-linux/lib/libcudart.so.11.6.55)
==2087==    by 0x695A634: ??? (in /usr/local/cuda-11.6/targets/x86_64-linux/lib/libcudart.so.11.6.55)
==2087==    by 0x695B18B: ??? (in /usr/local/cuda-11.6/targets/x86_64-linux/lib/libcudart.so.11.6.55)
==2087==    by 0x695105D: ??? (in /usr/local/cuda-11.6/targets/x86_64-linux/lib/libcudart.so.11.6.55)
==2087==    by 0x6934052: ??? (in /usr/local/cuda-11.6/targets/x86_64-linux/lib/libcudart.so.11.6.55)
==2087==    by 0x696DCC2: cudaMallocHost (in /usr/local/cuda-11.6/targets/x86_64-linux/lib/libcudart.so.11.6.55)
==2087==    by 0x64A589D: CreateCudaBufferBatch (in /opt/nvidia/deepstream/deepstream-6.1/lib/libnvbufsurface.so)
==2087==    by 0x64A48FD: NvBufSurfaceCreateImpl (in /opt/nvidia/deepstream/deepstream-6.1/lib/libnvbufsurface.so)
==2087==    by 0x64A4736: NvBufSurfaceCreate (in /opt/nvidia/deepstream/deepstream-6.1/lib/libnvbufsurface.so)
==2087==    by 0x6E5B8F7: ???
==2087==    by 0x6E537AA: ???
==2087==    by 0x6E4D944: ???
==2087==    by 0x64BC43A: ???
==2087==    by 0x681F08D: v4l2_ioctl (in /opt/nvidia/deepstream/deepstream-6.1/lib/libnvv4l2.so)
==2087==    by 0x67C24F6: gst_v4l2_allocator_start (in /opt/nvidia/deepstream/deepstream-6.1/lib/gst-plugins/libgstnvvideo4linux2.so)
==2087==    by 0x67C84BF: ??? (in /opt/nvidia/deepstream/deepstream-6.1/lib/gst-plugins/libgstnvvideo4linux2.so)
==2087==    by 0x48AE28A: gst_buffer_pool_set_active (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1602.0)
==2087==    by 0x67DC045: ??? (in /opt/nvidia/deepstream/deepstream-6.1/lib/gst-plugins/libgstnvvideo4linux2.so)
==2087==    by 0x658BD5A: ??? (in /usr/lib/x86_64-linux-gnu/libgstvideo-1.0.so.0.1602.0)
==2087==    by 0x658EAA7: ??? (in /usr/lib/x86_64-linux-gnu/libgstvideo-1.0.so.0.1602.0)
==2087==    by 0x658F199: ??? (in /usr/lib/x86_64-linux-gnu/libgstvideo-1.0.so.0.1602.0)
==2087==    by 0x48E2EFE: ??? (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1602.0)
==2087==    by 0x48E4F60: ??? (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1602.0)
==2087==    by 0x48EBD72: gst_pad_push (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1602.0)
==2087==    by 0x644E196: gst_base_parse_push_frame (in /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0.1602.0)
==2087==    by 0x6450F2A: gst_base_parse_finish_frame (in /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0.1602.0)
==2087==    by 0x674137E: ??? (in /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstvideoparsersbad.so)
==2087==    by 0x6448D25: ??? (in /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0.1602.0)
==2087==    by 0x644EDFD: ??? (in /usr/lib/x86_64-linux-gnu/libgstbase-1.0.so.0.1602.0)
==2087==    by 0x48E2EFE: ??? (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1602.0)
==2087==    by 0x48E4F60: ??? (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1602.0)
==2087==    by 0x48EBD72: gst_pad_push (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1602.0)
==2087==    by 0x5BAE0F9: ??? (in /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstisomp4.so)
==2087==    by 0x5BB374E: ??? (in /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstisomp4.so)
==2087==    by 0x5BCEE05: ??? (in /usr/lib/x86_64-linux-gnu/gstreamer-1.0/libgstisomp4.so)
==2087==    by 0x491A106: ??? (in /usr/lib/x86_64-linux-gnu/libgstreamer-1.0.so.0.1602.0)
==2087==    by 0x4A8B373: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.6)
==2087==    by 0x4A8AAD0: ??? (in /usr/lib/x86_64-linux-gnu/libglib-2.0.so.0.6400.6)
==2087==    by 0x4B40608: start_thread (pthread_create.c:477)
==2087==    by 0x4C7C132: clone (clone.S:95)

How did you observe the endless memory growing with a video file pipeline?

Starting htop. Filtering by gst-launch and looking at RES memory field. Also used memory-profiler tool. Same result.

Will the memory be released after the gst-launch finished?

Sure. But I’m using deepstream with live stream cameras which means that pipeline does not really have a finishing point.

Can you confirm from your side that it’s an issue with nvv4l2decoder?

I’ve tried live source with nvv4l2decoder, no mem leak found.

Can you give me the pipeline that you tried?
Did you check htop or use any profiler? No endlessly increasing RES memory?

gst-launch-1.0 rtspsrc location=rtsp://xxxxx ! rtph264depay ! h264parse ! nvv4l2decoder ! fakesink

How did you profile the memory?

We can monitor the system memory with “top”, and monitor GPU memory with “nvidia-smi” command.

Such log does not mean real memory leak. It may be some struct for driver initialization. We did not find memory leak with rtsp source test.

Yes. I was more thinking about this line:

==2087== 1,117,648 bytes in 1 blocks are still reachable in loss record 4,007 of 4,007

Some element is storing data in memory and not releasing it. It’s not a memory leak I agree. I just can’t find the element that is responsible. After lots of testing I don’t think it’s nvv4l2decoder though.

I’m using uridecodebin which stitches together pipeline elements based on URI file format.

When I add mp4 file as input for uridecodebin, it generates a pipeline like the following. This pipeline has the issue I’m talking about. RES memory is contantly growing.

gst-launch-1.0 filesrc location=../samples/tokyo/tokyo.mp4 ! qtdemux ! h264parse ! nvv4l2decoder ! fakesink

When I add webm file as input for uridecodebin, it generates a pipeline like this. This pipeline does NOT have an issue with memory. RES memory stays the same.

gst-launch-1.0 filesrc location=../samples/tokyo/tokyo.webm ! matroskademux ! nvv4l2decoder ! fakesink

It has to be something to do with qtdemux or h264parse, because I guess nvv4l2decoder gets the same input for both cases, right?

If the raw H264 inside the mp4 and webm files, the decode works in the same way.