Reduced framerate of 4K@30fps video encoding

Dear experts,

I am comparing the recording with H264/H265 encoding of 4K@30 video directly from the CSI2/Argus device vs via v4l2loopback-cloned device :

1 - Record directly from CSI2/Argus device :
gst-launch-1.0 -v nvarguscamerasrc sensor_id=0 sensor-mode=0 ! "video/x-raw(memory:NVMM), width=(int)3840, height=(int)2160, format=(string)NV12, framerate=(fraction)30/1" ! nvvidconv ! timeoverlay ! nvvidconv ! nvv4l2h265enc ! h265parse ! qtmux ! filesink location=${RECORD_PATH}/sony_fsm_from_src_3840x2160_$(date +"%Y_%m_%d_%H_%M_%p").mp4 -e

2 - Clone and record from cloned device :

a. Clone :

gst-launch-1.0 -v nvarguscamerasrc sensor-id=0 sensor-mode=0 ! "video/x-raw(memory:NVMM), format=NV12, width=${SONY_FSM_CAPTURE_W}, height=${SONY_FSM_CAPTURE_H}, framerate=${SONY_FSM_CAPTURE_FPS}/1" ! nvvidconv ! "video/x-raw(memory:NVMM)" ! \
        tee name=t ! queue !  nvvidconv ! "video/x-raw(memory:NVMM),width=${SONY_FSM_STREAM_W},height=${SONY_FSM_STREAM_H}" !  nvvidconv ! video/x-raw, format=YUY2 !  identity drop-allocation=1 ! v4l2sink device=${SONY_FSM_DEV_STREAM}  sync=false async=true \
        t. ! queue ! nvvidconv ! video/x-raw, format=YUY2 !  identity drop-allocation=1 ! v4l2sink device=${SONY_FSM_DEV_RECORD} sync=false async=true \
        t. ! queue ! nvvidconv ! video/x-raw, format=YUY2 !  identity drop-allocation=1 ! v4l2sink device=${SONY_FSM_DEV_SNAPSHOT} sync=false async=true \
        t. ! queue ! nvvidconv ! "video/x-raw(memory:NVMM),width=${SONY_FSM_AIPROCESSING_W},height=${SONY_FSM_AIPROCESSING_H}" ! nvvidconv ! video/x-raw, format=YUY2 ! identity drop-allocation=1 ! v4l2sink device=${SONY_FSM_DEV_AIPROCESSING} sync=false async=true

b. Record :
gst-launch-1.0 -v v4l2src device=${SONY_FSM_DEV_RECORD} ! video/x-raw ! timeoverlay ! nvvidconv ! nvv4l2h265enc ! h265parse ! qtmux ! filesink location=${RECORD_PATH}/sony_fsm_record_3840x2160_YUY2_$(date +"%Y_%m_%d_%H_%M_%p").mp4 -e

I did the recording of ~1 hour for each and reported by VLC, both could NOT reach the expected framerate which is 30. And paradoxically, the recording via v4l2loopbacked device seems to have higher framerate (~28fps) than the recording of direct CSI2/Argus device(25fps) :

I also tried to check the framerate with following gst-command and it stayed constantly ~30fps for 1 hour :
gst-launch-1.0 nvarguscamerasrc sensor-id=0 sensor-mode=0 ! 'video/x-raw(memory:NVMM),width=3840, height=2160, framerate=30/1, format=NV12' ! nvvidconv ! fpsdisplaysink text-overlay=0 name=sink_0 video-sink=fakesink sync=0 -v

Log file :
gst_framerate_3840x2160.txt (863.4 KB)

I would doubt on the performance of the 4K@30fps encoding as well as of the writing to the SSD storage. Therefore, I did other tests in which I recorded single CSI2/Argus device with 1080p@30fps (as I only have one physical IMX477) :

gst-launch-1.0 -v nvarguscamerasrc sensor_id=0 sensor-mode=1 ! "video/x-raw(memory:NVMM), width=(int)1920, height=(int)1080, format=(string)NV12, framerate=(fraction)30/1" ! nvvidconv ! timeoverlay ! nvvidconv ! nvv4l2h265enc ! h265parse ! qtmux ! filesink location=${RECORD_PATH}/sony_fsm_from_src_1920x1080_$(date +"%Y_%m_%d_%H_%M_%p").mp4 -e

or single and dual v4l2loopback devices with 1080p@30fps simultaneously :
a. Clone :

gst-launch-1.0 -v nvarguscamerasrc sensor-id=0 sensor-mode=1 ! "video/x-raw(memory:NVMM), format=NV12, width=${SONY_FSM_STREAM_W}, height=${SONY_FSM_STREAM_H}, framerate=${SONY_FSM_STREAM_FPS}/1" ! nvvidconv ! "video/x-raw(memory:NVMM)" ! \
        tee name=t ! queue ! nvvidconv ! video/x-raw, format=YUY2 !  identity drop-allocation=1 ! v4l2sink device=${SONY_FSM_DEV_STREAM}  sync=false async=true \
        t. ! queue ! nvvidconv ! video/x-raw, format=YUY2 !  identity drop-allocation=1 ! v4l2sink device=${SONY_FSM_DEV_RECORD} sync=false async=true \
        t. ! queue ! nvvidconv ! video/x-raw, format=YUY2 !  identity drop-allocation=1 ! v4l2sink device=${SONY_FSM_DEV_SNAPSHOT} sync=false async=true \
        t. ! queue ! nvvidconv ! "video/x-raw(memory:NVMM),width=${SONY_FSM_AIPROCESSING_W},height=${SONY_FSM_AIPROCESSING_H}" ! nvvidconv ! video/x-raw, format=YUY2 ! identity drop-allocation=1 ! v4l2sink device=${SONY_FSM_DEV_AIPROCESSING} sync=false async=true

b. Record :
gst-launch-1.0 -v v4l2src device=${SONY_FSM_DEV_RECORD} ! video/x-raw ! timeoverlay ! nvvidconv ! nvv4l2h265enc ! 'video/x-h265, stream-format=(string)byte-stream' ! h265parse ! qtmux ! filesink location=${RECORD_PATH}/sony_fsm_record_1920x1080_$(date +"%Y_%m_%d_%H_%M_%p").mp4 -e

gst-launch-1.0 -v v4l2src device=${SONY_FSM_DEV_SNAPSHOT} ! video/x-raw ! timeoverlay ! nvvidconv ! nvv4l2h265enc ! 'video/x-h265, stream-format=(string)byte-stream' ! h265parse ! qtmux ! filesink location=${RECORD_PATH}/sony_fsm_snapshot_1920x1080_$(date +"%Y_%m_%d_%H_%M_%p").mp4 -e

and all the recorded mp4 files were reported at expected framerate (30fps). This could help to eliminate my doubt on the writing performance of the SSD storage.

I would really appreciate if you could share any idea/advice on the degradation of the framerate in 4K@30fps encoding with my test(s), please ?

Thanks and best regards,
Khang

Hi,
In the use-case the frame data is copied from NVMM buffer to CPU buffer and this takes significant CPU loading. The bottleneck can be in CPU capability. Please run sudo tegrastats to check the system loading.

The optimal solution is to have frame data in NVMM buffer and send to encoder. This is zero memory copy and shall achieve target performance as stated in module data sheet.

Hi @DaneLLL,

BY referring to the copying from NVMM buffer to CPU buffer, did you mean the test-case with the v4l2loopback devices and/or the timeoverlay element with its relevant conversion ?

Best Regards,
Khang

Hi,
Yes, for this linkage:

... ! video/x-raw(memory:NVMM) ! nvvidconv ! video/x-raw ! ...

The frame data is copied from NVMM buffer to CPU buffer. This takes significant CPU usage and may cap the performance.

Hi @DaneLLL,

After removing the timeoverlay element (and its relevant conversion) as well as doing the control on the bitrate in the recording pipeline, it seems that I can now get more stable framerate.

Of course, I believe that it could also be better (in terms of CPU usage) if the nvv4l2camerasrc could be used with the v4l2loopback which is still an issue : Issue when try to use v4l2loopback and nvv4l2camerasrc - #24 by ShaneCCC

Best Regards,
Khang

Hi,
v4l2loopback is a 3rdparty frameworks and it works with CPU buffers. It doesn’t work with NVMM buffer. For a similar solution, you may try UDP. So that NVMM buffer can be encode to H264/H265 stream and stream out. In receiver, the stream can be decoded to NVMM buffer.

Or can use NvBufSurface APIs. In 5.1.3, there is demonstration to share NvBufSurface between processes.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.