NVENC's “tune” parameter results in lower quality in “constqp” mode for AV1 encoding

Hi.

I’m trying to encode frames using AV1 hardware encoder of NVENC, targeting the best possible quality.

I’m surprised to observe that -tune ull (ultra low latency) leads to a better quality compared to -tune uhq (ultra HQ), on a RTX 5090 with 570.124.06 Linux drivers.

The ffmpeg command I use is:

ffmpeg -hide_banner -loglevel error -i raw_frames.yuv -c:v av1_nvenc -video_track_timescale 50 -preset p7 -tune uhq -rc constqp -qp 1 -bitrate 1G -y encoded_frames.mp4

Here is a minimal script that downloads and encodes a bunch of images for comparison:

#!/usr/bin/env bash
set -e

# 500 frames, 720p, YUV 4:2:0, 50 FPS.
url_source="https://media.xiph.org/video/derf/y4m/ducks_take_off_420_720p50.y4m"
checksum="547cce45773077e27c71fd02d3411237"

ref_path="/tmp/encoding_ref.y4m"

if [ ! -f "$ref_path" ]; then
    echo "Downloading video sample to '$ref_path'."
    wget "$url_source" -O "$ref_path"
fi

if ! echo "$checksum $ref_path" | md5sum -c --status; then
    echo "Checksum mismatched. Files will be deleted, please retry."
    rm -f "$ref_path"
    exit 1
fi


declare -A ssim_results
declare -A psnr_results
declare -A size_results
keys=() 

tunes="ull ll hq uhq"
modes="vbr constqp"

for mode in $modes; do
    for tune in $tunes; do
        echo "Encoding video in mode '$mode' using tune '$tune'"
        key="${mode}_${tune}"

        keys+=("$key")

        encode_path="/tmp/encoded_${key}.mp4"
        
        if [ "$mode" == "vbr" ]; then
            ffmpeg -hide_banner -loglevel error -i "$ref_path" -c:v av1_nvenc -video_track_timescale 50 -preset p7 -tune $tune -rc vbr -bitrate 1G -y "$encode_path"
        else
            ffmpeg -hide_banner -loglevel error -i "$ref_path" -c:v av1_nvenc -video_track_timescale 50 -preset p7 -tune $tune -rc constqp -qp 1 -bitrate 1G -y "$encode_path"
        fi

        result=$(ffmpeg -i "$encode_path" -i "$ref_path" -lavfi "[0:v][1:v]ssim;[0:v][1:v]psnr" -f null - 2>&1)

        psnr=$(echo "$result" | grep -oP "PSNR .*? average:\K[0-9.]+")
        ssim=$(echo "$result" | grep -oP "SSIM .*? All:\K[0-9.]+")
        size=$(du -h "$encode_path" | cut -f1)

        psnr_results["$key"]=$psnr
        ssim_results["$key"]=$ssim
        size_results["$key"]=$size
    done
done

echo
echo "======== Summary of SSIM and PSNR results ========"
for key in "${keys[@]}"; do
    echo "- $key:"
    echo "    PSNR: ${psnr_results[$key]}"
    echo "    SSIM: ${ssim_results[$key]}"
    echo "    Size: ${size_results[$key]}"
done

It produces the following output:

======== Summary of SSIM and PSNR results ========
- vbr_ull:
    PSNR: 26.435912
    SSIM: 0.735398
    Size: 2.6M
- vbr_ll:
    PSNR: 26.435912
    SSIM: 0.735398
    Size: 2.6M
- vbr_hq:
    PSNR: 27.462104
    SSIM: 0.786980
    Size: 2.7M
- vbr_uhq:
    PSNR: 27.468363
    SSIM: 0.789680
    Size: 2.6M
- constqp_ull:
    PSNR: 58.942369
    SSIM: 0.999355
    Size: 381M
- constqp_ll:
    PSNR: 58.942369
    SSIM: 0.999355
    Size: 381M
- constqp_hq:
    PSNR: 52.918545
    SSIM: 0.998359
    Size: 335M
- constqp_uhq:
    PSNR: 49.381956
    SSIM: 0.996885
    Size: 327M

How to explain that PSNR/SSIM values are better with -tune ull than with -tune uhq in constQP mode?

QP value used here is very low. For ULL/LL there are no B-frames used, while there are 7 B-frames for UHQ.
The PSNR values for B-frames are lower when compared to I/P-frames. All frames in ULL/LL bitstream are I/P-frames, whereas majority of the frames in the UHQ bitstream are B-frames. So, the avg PSNR value observed for the UHQ case is lower.
If higher QP value is used, we do see average PSNR for UHQ is lower while file size is much lower.
For QP value 1, file sizes are close due to very low QP value.

Generally we measure BD-Rate at multiple QP points than just a PSNR for quality evaluation.
Hope this helps.