NvVideoEncoder Memory Leak, SegFault

wyattp6ffi · January 30, 2020, 12:27am

Hi Nvidia,

Thank you for your continued support. Im trying to record videos with exactly N frames in them and am accomplishing this by constructing a NvVideoEncoder to encode N frames, destructing it, and creating a new one for the next N frames. I’m running into an issue where my NvVideoEncoder spits out the following after about 200 constructions and destructions:

Failed to query video capabilities: Inappropriate ioctl for device
NvMMLiteOpen : Block : BlockType = 4
===== MSENC =====
NvMMLiteBlockCreate : Block : BlockType = 4
NvH264MSEncSetCommonStreamAttribute: DRC not supported through attribute. Use CropRect
[ INFO ] [00:16:09.532929] | vid_source | Setting output video bitrate: 63700992
875967048
842091865
NvH264MSEncSetCommonStreamAttribute: LevelIdc conformance violation
[ INFO ] [00:16:09.533108] | vid_source | Setting bitrate to constant
===== MSENC blits (mode: 1) into tiled surfaces =====
NvRmChannelSubmit: NvError_IoctlFailed with error code 22
NvRmPrivFlush: NvRmChannelSubmit failed (err = 196623, SyncPointIdx = 23, SyncPointValue = 0)
NvRmChannelSubmit: NvError_IoctlFailed with error code 22
NvRmPrivFlush: NvRmChannelSubmit failed (err = 196623, SyncPointIdx = 22, SyncPointValue = 0)
NvRmPrivFlush: NvRmChannelSubmit failed (err = 196623, SyncPointIdx = 23, SyncPointValue = 0)
NvRmPrivFlush: NvRmChannelSubmit failed (err = 196623, SyncPointIdx = 22, SyncPointValue = 0)
NvRmPrivFlush: NvRmChannelSubmit failed (err = 196623, SyncPointIdx = 23, SyncPointValue = 0)
NvRmPrivFlush: NvRmChannelSubmit failed (err = 196623, SyncPointIdx = 22, SyncPointValue = 0)
NvRmPrivFlush: NvRmChannelSubmit failed (err = 196623, SyncPointIdx = 23, SyncPointValue = 0)
NvRmPrivFlush: NvRmChannelSubmit failed (err = 196623, SyncPointIdx = 22, SyncPointValue = 0)

I’ve noticed from Valgrind there is also a memory leak as my consumed memory goes up and up with the the number of encoders created. The symptoms appear to be almost exactly the same as https://devtalk.nvidia.com/default/topic/1044597/jetson-tx2/nvvideoencoder-repeated-start-stop-causes-crash-and-memory-leak/1
I have tried the patch https://devtalk.nvidia.com/default/topic/1043548/jetson-tx1/-mmapi-please-merge-two-libtegrav4l2-so-that-modify-different-issues/post/5294104/#5294104, but it did not solve the problem just as it did not work for Tessier.

As if things couldn’t get any stranger, I am running into a separate issue that ONLY occurs when my lens cap is on. Not only that, this error occurs only on my 8.9 megapixel camera, and not on my 1.6 megapixel camera. When my lens cap is on my 8.9 megapixel camera, I am getting segfault errors from the Capture Plane. Below is the GDB backtrace for that error.

Thread 1 “Capture Plane” received signal SIGSEGV, Segmentation fault.
0x0000007fb72d7198 in NvBufferFromFd () from /usr/lib/aarch64-linux-gnu/tegra/libnvbuf_utils.so.1.0.0
(gdb) bt
#0 0x0000007fb72d7198 in NvBufferFromFd () from /usr/lib/aarch64-linux-gnu/tegra/libnvbuf_utils.so.1.0.0
#1 0x0000007fb72d8dc8 in NvBufferMemUnMap () from /usr/lib/aarch64-linux-gnu/tegra/libnvbuf_utils.so.1.0.0
#2 0x0000007fab86d6e4 in release_enc_output_buffers () from /usr/lib/aarch64-linux-gnu/tegra/libtegrav4l2.so
#3 0x0000007fab8647e0 in vidioc_enc_ioctl () from /usr/lib/aarch64-linux-gnu/tegra/libtegrav4l2.so
#4 0x0000007fab85b32c in TegraV4L2_Ioctl () from /usr/lib/aarch64-linux-gnu/tegra/libtegrav4l2.so
#5 0x0000007fac3bcdf8 in plugin_ioctl () from /usr/lib/aarch64-linux-gnu/libv4l/plugins/libv4l2_nvvideocodec.so
#6 0x0000007fb74656d0 in v4l2_ioctl () from /usr/lib/aarch64-linux-gnu/libv4l2.so.0

I go this backtrace the second time I ran GDB:
Thread 1 “Capture Plane” received signal SIGSEGV, Segmentation fault.
do_lookup_x (undef_name=0x1786558 “\310dx\001”, undef_name@entry=0x7fab858f3b “NvBufferDestroy”, new_hash=new_hash@entry=3460332813, old_hash=0x7fab88b000, old_hash@entry=0x7fffffa990,
ref=0x16ff39086e5, result=0x0, result@entry=0x7fffffa9a0, scope=, i=0, version=0x7fffffaa28, version@entry=0x0, flags=flags@entry=5, skip=skip@entry=0x0, type_class=1,
type_class@entry=127, undef_map=undef_map@entry=0x1786200) at dl-lookup.c:395
395 dl-lookup.c: No such file or directory.
(gdb) bt
#0 do_lookup_x (undef_name=0x1786558 “\310dx\001”, undef_name@entry=0x7fab858f3b “NvBufferDestroy”, new_hash=new_hash@entry=3460332813, old_hash=0x7fab88b000, old_hash@entry=0x7fffffa990,
ref=0x16ff39086e5, result=0x0, result@entry=0x7fffffa9a0, scope=, i=0, version=0x7fffffaa28, version@entry=0x0, flags=flags@entry=5, skip=skip@entry=0x0, type_class=1,
type_class@entry=127, undef_map=undef_map@entry=0x1786200) at dl-lookup.c:395
#1 0x0000007fb7fdb1f4 in _dl_lookup_symbol_x (undef_name=0x7fab858f3b “NvBufferDestroy”, undef_map=0x1786200, ref=0x7fffffaa38, ref@entry=0x7fffffaa58,
symbol_scope=0x7fb644fc14 <NvRmMemUnmap+36>, version=0x0, type_class=127, type_class@entry=1, flags=5, skip_map=skip_map@entry=0x0) at dl-lookup.c:829
#2 0x0000007fb7fdef08 in _dl_fixup (l=, reloc_arg=) at dl-runtime.c:111
#3 0x0000007fb7fe57b4 in _dl_runtime_resolve () at …/sysdeps/aarch64/dl-trampoline.S:94
#4 0x0000007fab86c788 in release_enc_output_buffers () from /usr/lib/aarch64-linux-gnu/tegra/libtegrav4l2.so
#5 0x00000000c0145608 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

My team can’t ship our software unless we determine what the cause of this error is, and I am looking for any advice on how to solve the above problems. If there is a way to write N frames from the encoder to separate video files (e.g. videos 1, 2, 3, etc…) instead of constructing and destructing the encoder that would be a great solution to all of these problems as well.

Thanks again,
Wyatt

DaneLLL · January 30, 2020, 6:14am

Hi,
We have the option in 01_video_encode:

-s <loop-count>       Stress test [Default = 1]

Please compare with the reference sample. We have verified the looping case and it shall work fine.

wyattp6ffi · February 5, 2020, 11:44pm

If you want to process exactly N frames in a video the process is this:

Do not destruct and reconstruct the encoder. Too much can go wrong, and none of the patches have worked for me. Instead, do

Enable SpsPps at each IDR frame during setup: nv_encoder->setInsertSpsPpsAtIdrEnabled(true);
Force an IDR frame on the first frame of the new video (i.e., on frame N+1): nv_encoder->forceIDR()
Then, continue to process frames until that frame is written, this will take more than one dequeue since the encoder holds on to previous frames to encode later frames. In my case I had to process N+2 frames to get the first N to be written to disk.

If you have followed these steps correctly, you will be able to decode your videos with ffmpeg.