We need decode function because I’m developing both a streaming server and a streaming client.
(Now it have one-way communication, a Jetson TX2 is a server, another Jetson TX2 is a client.
But in the next plan, it will be bidirectional, both encoder and decoder in a Jetson TX2.)
Which does this “thread” mean ?
(1) C++ (POSIX) thread
(2) TOPIC in this forum
In the comment #17, you said: Number of qbuffer and dqbuffer is not exactly same, so for some fd, NvBufferDestroy(fd) is not called.
In the comment #18, I wrote: I get new buffers using setupPlane(), call qBuffer() for acquiring an image/frame to the encoder, and reuse empty buffers from dqBuffer().
Do I have to use NvBufferDestroy(fd) and createNvBuffer() instead of dqBuffer() and reuse the empty buffers of output plane ?
I made a fair copy of your #17 code, in #18, it’s not mine.
I know that it is different between number of output_plane.qBuffer() and number of capture_plane.dqBuffer().
(SPS/PPS is added)
If you mean “number of output_plane.qBuffer() is not same as number of output_plane.dqBuffer()”,
I doubt it and can not understand it.
Which does this “thread” mean ?
(1) C++ (POSIX) thread
(2) TOPIC in this forum >> (1)
In the comment #17, you said:
Number of qbuffer and dqbuffer is not exactly same, so for some fd, NvBufferDestroy(fd) is not called.
In the comment #18, I wrote:
I get new buffers using setupPlane(), call qBuffer() for acquiring an image/frame to the encoder, and reuse empty buffers from dqBuffer().
Do I have to use NvBufferDestroy(fd) and createNvBuffer() instead of dqBuffer() and reuse the empty buffers of output plane ? >> IN setupPlane, there is a parameter V4L2_MEMORY_DMABUF/MMAP, if it is DMABUF, you have to use createNvBuffer to create a dmabuf, then pass fd to encode, if it is MMAP, a memory will be allocated in our library, you can use directly.
I made a fair copy of your #17 code, in #18, it’s not mine.
I know that it is different between number of output_plane.qBuffer() and number of capture_plane.dqBuffer().
(SPS/PPS is added)
If you mean “number of output_plane.qBuffer() is not same as number of output_plane.dqBuffer()”,
I doubt it and can not understand it. >> we can also output two or more slices for a frame.
Could you please explain it in simple words ?
Can I have your email?
So I can send you new library.
We have fix it now.
I appreciate you for your quick and polite response.
In our system, DMABUF is used from NvVideoConverter to NvVideoEncoder,
because it does not need memcpy() between buffers.
There are 3 threads for transfer image from NvVideoConverter to NvVideoEncoder.
(1) Converter CAPTURE plane dqThread
(2) Encoder OUTPUT plane dqThread
(3) Encoder main thread for OUTPUT plane qBuffer()
Between (1) and (3), (2) and (3), it uses 2 std::queue.(#1 and #2, below)
(1)–(#1:DMABUF)–>(3)
(2)–(#2:NvBuffer)–>(3)
Is it better that NvBufferDestroy(fd) is called in (2) Encoder OUTPUT dqThread (callback) ?
Is it better that createNvBuffer() is called in (3) Encoder main thread ?
But it is a little complicated how to use createNvBuffer().
NV::IImageNativeBuffer *iNativeBuffer =
interface_cast<NV::IImageNativeBuffer>(iFrame->getImage());
if (!iNativeBuffer)
ORIGINATE_ERROR("IImageNativeBuffer not supported by Image.");
fd = iNativeBuffer->createNvBuffer(STREAM_SIZE,
NvBufferColorFormat_YUV420,
(DO_CPU_PROCESS) ? NvBufferLayout_Pitch: NvBufferLayout_BlockLinear);
Which document do I have to refer for NV::IImageNativeBuffer, iFrame->getImage() ?
And the relation between NvBuffer of NvVideoConverter capture plane and iFrame->getImage() ?
In our callback function for encoder.output_plane_dqBuffer,
it calls converter.capture_plane.qBuffer() with shared_buffer->index.
I have similar issue after a lot of encoder restarts with a TX2 on 28.1.
The data flow is the following:
V4L2 input (YUV422) → GPU memcpy to 1 or more converter output plane(s) with programmable frame rate decimation → CUDA processing on converter capture plane → encoder output plane → encoder capture plane
After stop, the total number of buffer queues and dequeues, and number of queued buffers (in the same order as the setup):
Total queued buffers: 315 324 315 321
Total dequeued buffers: 305 315 314 316
Queued buffers: 10 9 1 5
This is somewhat strange, as I call conv->waitForIdle() before these prints, which should (according to the documentation): “Waits until all buffers queued on the output plane are converted and dequeued from the capture plane.”
Anyway, I tried to dequeue all buffers buffers on all planes using a code similar to this:
nvbuf_utils: dmabuf_fd 1769239141 mapped entry NOT found
nvbuf_utils: Can not get HW buffer from FD... Exiting...
But at least the dequeue is successfull and afterwards the number of queued buffers is zero.
If I try to dequeueing any of the other planes, the thread stops at the dequeue and stays there forever.
I guess I get the error print because the converter output plane is using V4L2_MEMORY_MMAP buffers, right?
If I understand you correctly, the main reason of the memory leakage is that buffers queued on any plane are not deleted. I hoped that dequeueing an deleting is an option, but it does not seem to work. Is my idea completely wrong?
If I have to allocate buffers manually using NvBufferCreateEx, on which planes should I do that? The above print shows that all of the planes have queued buffers which are not deleted when the stream is stopped, right?
Hi Tessier,
Please share steps to reproduce the issue in running 01_video_encode. If it requires to apply a patch to 01_video_encode, please share it also.
The basic symptom on our real product is the same as in the code I provided:
Allocate converter & encoder
Start and stop the stream in a loop
After several start/stops Multimedia API crashes
The difference is the way it crashes:
On our real product I get the same error what you can see in post 1
In the code I provided the error print is different but the effect is the same
As I wrote, to reproduce the issue I used a code provided by NVIDIA in the linked forum. It is the same as our real product in a manner that it uses VIC to convert 4:2:2 to 4:2:0 and then encodes. 01_video_encode only uses the encoder, so it is not as close to the real use case as 12_camera_v4l2_cuda_video_encode.zip
Could you please give try the code I uploaded?
I will try it on 28.2.1, but on the real product we cannot do a version upgrade right now.
Thanks.
Hi Tessier,
Can you please try to delete and re-create NvVideoEncoder? Starting and stopping the stream in a loop is not a SQA test case we verify in BSP release, so it might be with instability.
Please also contact NVIDIA salesperson so that we can check and prioritize this issue. Thanks.
The reason I do not delete the encoder is that at the beginning of the development phase I found that deleting it sometimes hangs, “delete enc;” never returns. This is why I moved to allocate both the VIC and encoder statically.
I modified 01_video_encode to start/stop in a loop, it also crashes after ~274 frames, just as the VIC+encoder sample, if encoder allocation is static.
I tried non-static allocation, it did not fail yet, so I will try that again on our real product.
You can find the modified 01_vide_encode here:
[url]http://home.mit.bme.hu/~szanto/tegra/01_video_encode_mod.zip[/url]
To run it: ./video_encode bunny_1280_UYVY_420.bin 1280 720 H265 encoded.h265 -br 4000000
There is a define at the beginning of the C file #define USE_STATIC_ENC 1
you can turn on or off the encoder static allocation if you want to give it a try.
Additional info: it also crashes with non-static hardware allocation, it just takes much longer. After some time the process gets killed, most probably because it runs out of memory - as I see memory usage continuously grows in tegrastats.