[mmapi] Use VIC to do simple video stitching

Use VIC to do simple video stitching.

Which function in multi-media-api can do this?

Is it this two function ?

Int
NvVideoConverter :: setDestRect (uint32_t left, uint32_t top, uint32_t width,
             Uint32_t height)

Int
NvVideoConverter :: setCropRect (uint32_t left, uint32_t top, uint32_t width,
             Uint32_t height)

Hello Lilin,
Which branch do you use?

Normally, you need allocate a nvbuffer for Dest surface.
Then you can copy source image into Dest surface by convert for many times.
For each convert, you can use setDestRect to specify the destination rect by setDestRect in destination surface.

Hi, waynezhu:

Thank you for your reply!

I’ll try it .

Hi, waynezhu,

I follow your ideas to learn about the mmapi code, there are a few questions to ask.

1, is nvmm_buffer allocated for Dest surface queued to the VIC 's capture_plane ?
2, the VIC capture_plane->setupPlane() type is V4L2_MEMORY_DMABUF ?
3, if it is, capture_plane->qBuffer() need a NvBuffer *shared_buffer, how do I get NvBuffer by dma_fd, then used as shared_buffer

Looking forward to your reply, thank you.

Hi lilin:
1, is nvmm_buffer allocated for Dest surface queued to the VIC 's capture_plane ?
Depends on which memory type you are using, if V4L2_MEMORY_DMABUF, no nvmmbuffer is allocated.
If V4L2_MEMORY_MMAP, then nvmmbuffer is allocated.
2, the VIC capture_plane->setupPlane() type is V4L2_MEMORY_DMABUF ?
I think both V4L2_MEMORY_MMAP and V4L2_MEMORY_DMABUF can be used.
If V4L2_MEMORY_MMAP, you can use nvbuffer on capture plane as destination buffer.
If V4L2_MEMORY_DMABUF, you can use nvbuffercreate to create a buffer as destination buffer, then use fd as input, you can refer to sample camera_recording about how fd is used.

3, if it is, capture_plane->qBuffer() need a NvBuffer *shared_buffer, how do I get NvBuffer by dma_fd, then used as shared_buffer
See camera recording sample about how V4L2_MEMORY_DMABUF is used.

Hi, waynezhu,

Thank you for your guidance.

I have modified 02_video_dec_cuda, to achieve the simple stitching function, I also found some problems during the test:

1, I used VIC to copy a decoded images to different target areas , I must wait for a VIC’s capture_plane callback function is called(VIC processing is completed), then let another VIC to copy another image, otherwise there will be Conflict occurred. I use VIC to copy the same decoded images to the non-overlapping target area, why would it conflict?

2, VIC processing image speed is slow, from the input of a frame image, to the output of a frame of synthetic images, the slowest to 25ms or so, the second VIC (my conv1)will take more time. is there any way to improve the processing speed

The attachment is the sample code I have modified

This can run normally

./video_dec_cuda ../../data/Video/sample_outdoor_car_1080p_10fps.h264 H264 --delay 1 --stitch-mode 1

./video_dec_cuda ../../data/Video/sample_outdoor_car_1080p_10fps.h264 H264 --delay 1 --stitch-mode 2

This will conflict:

./video_dec_cuda ../../data/Video/sample_outdoor_car_1080p_10fps.h264 H264 --delay 0 --stitch-mode 1

./video_dec_cuda ../../data/Video/sample_outdoor_car_1080p_10fps.h264 H264 --delay 0 --stitch-mode 2

The program will print waiting time.

Looking forward for your reply, thank you.

02_video_dec_cuda_for_stitch_test.tar.gz (12.3 KB)

1, I used VIC to copy a decoded images to different target areas , I must wait for a VIC’s capture_plane callback function is called(VIC processing is completed), then let another VIC to copy another image, otherwise there will be Conflict occurred. I use VIC to copy the same decoded images to the non-overlapping target area, why would it conflict?

Yes You must wait for capture_plane callback function being called, then let another VIC to stitch.

2, VIC processing image speed is slow, from the input of a frame image, to the output of a frame of synthetic images, the slowest to 25ms or so, the second VIC (my conv1)will take more time. is there any way to improve the processing speed

Normally, it only need 1.5ms to do stitch. I need some time to investigate your code.

I think you can refer to following link for how to measure VIC latency:
https://devtalk.nvidia.com/default/topic/1023424/jetson-tx1/mmapis-12_camera_v4l2_cuda-time-consuming-question/post/5207081/?offset=6#5207109

I double confirmed, 1080p 1 frame takes about 1.5 ms

Hi, waynezhu,

Thank you for your reply.

Use the method you provided, the test results between 4-10 ms.

My operating environment is TX1 L4T 28.1

Code in the attachment, I can not find the problem in my code and please help me analyze it.

Thanks.
02_video_dec_cuda_for_stitch_test.tar.gz (12.4 KB)

I modified the 02_video_dec_cuda sample and tested the time it took to process a frame of data.

The fastest time has 2ms, but most are about 5ms. Please help me analyze the reasons can not reach 1.5ms.
Thank you.

Source code in the attachment.

Command please use:

./video_dec_cuda ../../data/Video/sample_outdoor_car_1080p_10fps.h264 H264 --input-nalu

My environment is TX1 L4T28.1

Here is my print

conv0 output_plane qBuffer frame[5] at 222 ms
conv0 capture_plane dqBuffer frame[5] at 228 ms

conv0 output_plane qBuffer frame[6] at 278 ms
conv0 capture_plane dqBuffer frame[6] at 288 ms

conv0 output_plane qBuffer frame[7] at 348 ms
conv0 capture_plane dqBuffer frame[7] at 352 ms

videodec_main.cpp (34.5 KB)

Thank you for your INFO, li_lin.

For VIC with mmap buffer, there is a mmap and unmmap operation in driver.
This will takes additional 2 ms.

Could you have a try with dma_buf?
VIC’s two planes support dma_buf as input and output.

Hi, waynezhu,

I still modified 02_video_dec_cuda .
and I set VIC’s two planes to use dma_buf.
Source code in the attachment.

Test results are, the fastest 4ms, usually 5ms, the slowest time there are 10ms.

Here are my test steps

1, record an H.264 file

gst-launch-1.0 -e nvcamerasrc fpsRange="25.0 25.0" ! 'video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)I420, framerate=(fraction)25/1' ! \
nvvidconv flip-method=0 ! 'video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)I420' ! omxh264enc bitrate=2000000 \
! 'video/x-h264, stream-format=(string)byte-stream' ! filesink location=test.h264 -e

2, run the test program

./video_dec_cuda test.h264 H264 --input-nalu

Print the following:

conv0 output_plane qBuffer frame[45] at 2000 ms
conv0 capture_plane dqBuffer frame[45] at 2005 ms

conv0 output_plane qBuffer frame[46] at 2046 ms
conv0 capture_plane dqBuffer frame[46] at 2056 ms

conv0 output_plane qBuffer frame[47] at 2088 ms
conv0 capture_plane dqBuffer frame[47] at 2093 ms

Please check it . thank you .
videodec_main.cpp (36.3 KB)

Lilin,

In Rel28.2 release, we use following two functions to do clip & stitch,
Could you wait for the following release:

/**

  • This method is used to transform one DMA buffer to another DMA buffer.
  • It can support transforms for copying, scaling, fliping, rotation and cropping.
  • @param[in] src_dmabuf_fd DMABUF FD of source buffer
  • @param[in] dst_dmabuf_fd DMABUF FD of destination buffer
  • @param[in] transform_params transform parameters
  • @return 0 for sucess, -1 for failure.
    */
    int NvBufferTransform (int src_dmabuf_fd, int dst_dmabuf_fd, NvBufferTransformParams *transform_params);

/**

  • This method is used to composite multiple input DMA buffers to one output DMA buffer.
  • It can support composition of multiple input frames to one composited output.
  • @param[in] src_dmabuf_fds array of DMABUF FDs of source buffers to composite from
  • @param[in] dst_dmabuf_fd DMABUF FD of destination buffer for composition
  • @param[in] composite_params composition parameters
  • @return 0 for sucess, -1 for failure.
    */
    int NvBufferComposite (int *src_dmabuf_fds, int dst_dmabuf_fd, NvBufferCompositeParams *composite_params);