[Encoder] Enabling RPS without setting NumRefFrames causes artifacts

Hi,
Recently came across this bug, not sure if this is an Orin specific bug or a Jetpack 5 bug as our older devices are TX2s running jp4…
The issue comes about when I was trying to enable SVC encoding on our JP5 platforms. I noticed we were seeing artifacts whenever we enableExternalRPS. I even attempetd to do a simple RPS single reference scheme and still saw the artifacts.
I went into the sample 01_video_encoder and noticed you were modifying your num_reference_frames when if (ctx->externalRPS && ctx->RPS_threeLayerSvc) to 2. This would also be the value you then set nMaxRefFrames and also when calling setNumReferenceFrames. So I decided to try this out and called setNumReferenceFrames(1)(same value we are using for nMaxRefFrames) and this finally fixed our issue.
I went back to streams and noticed on our TX2 devices in the SPS max_num_ref_frames = 1 where on our Orin it was set to 4.
Was there a change to the default behavior between JP4/5 or TX2/Orin? Is there a requirement needed to call setNumReferenceFrames whenever using external RPS? I would have assumed that the encoder would just use what is in the provided rps as the final decision on how to set ref frames but this does not seem to be the case.
Thank you for the clarification!

Edit: This is all for H264 encoding

Hi,
Would like to get more infromation. We have the config if you enable –erps and –svc options:

    if (ctx->externalRPS && ctx->RPS_threeLayerSvc)
    {
        ctx->rps_par.m_numTemperalLayers = 3;
        ctx->rps_par.nActiveRefFrames = 0;

        ctx->input_metadata = true;
        ctx->report_metadata = true;
        ctx->num_reference_frames = 2;
        ctx->iframe_interval = ctx->idr_interval = 32;
        if (ctx->encoder_pixfmt == V4L2_PIX_FMT_H264)
        {
            ctx->bGapsInFrameNumAllowed = false;
            ctx->nH264FrameNumBits = 8;
        }
        else if (ctx->encoder_pixfmt == V4L2_PIX_FMT_H265)
        {
            ctx->nH265PocLsbBits = 8;
        }
    }

Do you mean this setting does not work? And please share which Jetpack version you are using. The latest versions for Orin series are 5.1.4 and 6.0GA.

Hi Daniel, thanks for the quick reply.

I see the example from your sample app. I am sure the sample app works. What I am referring to is in JP4/TX2 I did not need to call setNumReferenceFrames in order for svc to work in my own test app. It seems that the default value for sps_max_num_ref_frames may have changed from 1 to 4 between these versions. That is at least something I am noticing when analyzing the bitstream.
From what I can see the docs do not mention that we need to call setNumReferenceFrames in order to use rps. But if I do not call setNumReferenceFrames and change the value to my expected number of refs I see encoding artifacts. Is there a bug somewhere in the encoder that is using sps_max_num_ref_frames to actively set the number of references?
For what its worth, pps_num_ref_idx_l0_default_active_minus1 is equal to 0(1).

I am on 5.1.x (not sure how to check exact version)

Jetpack 5.1.2 [L4T 35.4.1], also please see above reply

Hi,
We would suggest follow the sample code in your app. This is advancing function since the parameters have to be configured for every frame. If either parameter is not well set, it may not work properly.

Hi Daniel,
I am still a bit confused on the output I am seeing. If I set sps.max_num_ref_frames (setNumReferenceFrames ) and v4l2_enc_frame_ext_rps_ctrl_params.nMaxRefFrames to 4 but only have 1 nActiveRefFrames I will see encoding artifacts. Do you know the reason for this?

Also,
What is the purpose of nMaxRefFrames Jetson Linux API Reference: v4l2_enc_frame_ext_rps_ctrl_params_ Struct Reference | NVIDIA Docs ? How is it different from sps.max_num_ref_frames ? Thank you for the help!

On a bit more investigation, I think the issue is that the encoder does not obey the user setting for nActiveRefFrames. Even if I have 2 frames in my returned RPS list (from the v4l2_ctrl_videoenc_outputbuf_metadata) I should be able to set my my next frames RPS to only use 1 frame if I want.
Is this a bug on the encoder or am I doing something incorrectly?

Especially since num_ref_idx_l0_active_minus1 = 0 (1) I would think the encoder should only be trying to use a single reference…

Hi,
The three-layer SVC is to encode frames into 3 layers like:

Please check populate_ext_rps_threeLayerSvc_Param() to know how to configure the parameters. And then adaptto your use-case. It should be working if you follow the examples.

Hi @DaneLLL ,
Yes this is how I would want the RPS to look. BUT in your example in populate_ext_rps_threeLayerSvc_Param the RPS looks like this:

If I change ctx->num_reference_frames from 2 to 1 I can get the processing you have shown but that is not what you have in the example code.

if (ctx->externalRPS && ctx->RPS_threeLayerSvc)
    {
        ctx->rps_par.m_numTemperalLayers = 3;
        ctx->rps_par.nActiveRefFrames = 0;

        ctx->input_metadata = true;
        ctx->report_metadata = true;
        **ctx->num_reference_frames = 2;**
...

Hi,
Your understanding to populate_ext_rps_threeLayerSvc_Param() seem not right. It clearly defines the reference frames in the function, so the encoded frames are in three layers. Once the network condition is bad, it can drop P1,P3,P5… frames to be half frame rate in transmitter. And receiver still can decode the frames due to the three-layer structure.

Hi @DaneLLL ,

Sorry I think I had a bit of confusion of what was going on in populate_ext_rps_threeLayerSvc_Param() but I think I have things mostly working on my end.
I am still seeing encoding artifacts but its a bit more limited. I am now only seeing artifacts when nFrameId % temporalCycle == 3. This would be the frame in layer 2 that references layer 1.
Here is an example of an artifact:
Screenshot 2024-09-27 at 11.00.41 AM

One thing I tested is if I always set nCurrentRefFrameId = previousTemporal0Id. This is not what I want but it does resolve the artifacts from occurring.

I am printing out the RPS at output from the metadata and can see that the previous frame (at TL1) is in the RPS. I am also printing the setting of the frame and can see the RPS matches and nCurrentRefFrameId is set correctly. Any advice would be appreciated. Thank you!

Attaching the stream. Thank you
svc_yes_tempref_setnumref2.ts.zip (2.4 MB)

Hi,
Do you mean you run 01_video_encode sample with the two options:

        --erps                Enable External RPS [Default = disabled]
        --svc                 Enable RPS Three Layer SVC [Default = disabled]

And observe the issue?

Hi @DaneLLL ,

Yes I ran the test app using this command ./video_encode out.yuv 3840 2160 H264 out.h264 --dbg-level 8 --svc --erps -sf 1
And I ge the following output
out_sample_encoder.h264.zip (672.6 KB)

In this test sample you can see the smaller corruptions here.


Hi,
Please try Jetpack 6.1. If the issue is still present, please share the YUV file. We will set up AGX Orin developer kit to replicate it and check.

Will try to get this set up on the AGX. I am also trying to test this code on an Xavier and will get back to you.
I also reran the above test but using H265 instead or H264 and the corruption is gone. But for our use case we need H264 working. Here is the H265 stream.
out_sample_encoder.h265.zip (416.5 KB)

Here is the YUV
out.yuv.zip (32.5 MB)

Hi @DaneLLL
I was also able to repro this on an Xavier running the same version of Jetpack. I am not sure if I will be able to test on Jetpack 6 as my company has not adopted it yet. Are you able to test on your end? Is Nvidia still supporting Jetpack 5? Thank you for the help!