The Memory usage difference between multimedia_apis and gstreamer commands

Our program based on multimedia_apis has some problems on memory usage when encoding 3840x2160 video.

  • When Gst command below is called(3840x2160 encode), The memory usage is about 150MB
  • gst-launch-1.0 v4l2src device="/dev/video1" ! 'video/x-raw, width=1280, height=720, format=(string)I420, framerate=25/1' ! nvvidconv ! 'video/x-raw(memory:NVMM), width=(int)3840, height=(int)2160, format=(string)I420' ! omxh265enc bitrate=2000000 ! 'video/x-h265, stream-format=(string)byte-stream' ! rtph265pay pt=98 ! udpsink host=192.xxx.xx.xx port=6000
    
  • When use multimedia_apis to implement similar function, The memory usage is rather big, 511MB
  • I build the demo: tegra_multimedia_api_bk\samples\01_video_encode, and then run
    ./video_encode Kimono_1920x1080.yuv 3840 2160 H265 Kimono_1920x1080.h265 -br 410000 -ifi 25 -fps 25 1 -hpt 1
    
  • I also tried to reduce the buffer number from 10 to 4 (in video_encode_main.cpp, line 838 843 848 855 1099)
  • The memory usage is reduced as well, about 360M, but it's still rather bigger than gstreamer.

    So, what’s the reason? How can I optimize our program to lower memory usage?

    Thx!

    Hi,
    One optimization is to do memory map/unmap dynamically:

    ret = ctx.enc->output_plane.mapOutputBuffers(v4l2_buf, ctx.output_plane_fd[i]);
    
    ret = ctx.enc->output_plane.unmapOutputBuffers(i, ctx.output_plane_fd[i]);
    

    In default code flow, it is executed in initialization and termination. You can execute it for reading every frame.

    Thanks!

    So, If this helps on decoder as well?

    And here is some more questions…

  • Why does multimeida_apis program cost more RAM than gstreamer command?
  • Do mm-api and gstreamer use different low-level libraries?
  • Or they work in different ways?
  • Thanks very much!!

    Hi,

    By default the decoded buffers are not mapped to CPU. If you want to do post-processing on decoded buffers, you can dynamically map/unmap the buffers via NvBuffer APIs.

    The main difference should be in buffer map/unmap.

    No, low-level libraries are the same

    One is in gstreamer and the other is in v4l2.

    I tried to implement dynamic map/unmap based on the demo 01_video_encode, my thought is(just show important lines):

  • delete code between about 909 and 1085(the for loop to map memory for output plane)
  • in the while loop after the for loop, call
  • ctx.enc->output_plane.getNthBuffer(0)
    
    ctx.enc->output_plane.mapOutputBuffers(v4l2_buf, ctx.output_plane_fd[0])
    

    then call read_video_frame(), etc.(not change)

    after qBuffer() is called, call dqBuffer(), then call

    ctx.enc->output_plane.unmapOutputBuffers(0, ctx.output_plane_fd[0])
    

    But the memory cost is almost the same(510MB for 4K encode, output h265 file is the same), so… Am I wrong and how to fix this? Or can you give me some code or demo using dynamic map/unmap?

    Thanks!! :-)

    Hi,
    Please share how you profile memory usage.

    I use the tool: tegrastats

    I opened two ssh, one runs video_encode, the other runs the command:

    ./tegrastats --interval 50 | cut -c1-15
    

    to show memmory usage

    [memory usage when video_encode is running] - [memory usage when video_encode isn’t running] is about 510MB.

    I carried out another test, in the demo:

    setup_output_dmabuf(&ctx, NUM_BUFFERS)
    

    The Macro NUM_BUFFERS is defined as 10, 6, 4.
    respectively, the memory usage is 510MB, 510MB, 359MB not relevant about if I dynamically map/unmap.

    Hi,
    We will check and clarify.