Mysterious .xorsign.abs modifier for floating-point min/max introduced with Ampere

Browsing the latest PTX documentation, I noticed that with Ampere (sm_86) a new .xorsign.abs modifier was introduced for the floating-point minimum / maximum instructions min{.f32 | .f64} and max{.f32 | .f64}. When this is specified, the minimum / maximum of the absolute value of the input operands is computed, and the XOR of the sign of the inputs is applied to the result (unless it is a NaN).

See below for a little program that takes this functionality for a spin. I am asking myself: What use case could possibly benefit from this functionality? Even after pondering this for 15 minutes, I have no idea. If anyone has an idea where this could be useful, I’d love to hear about it.

An internet search revealed nothing, other than that support for this variant was added to LLVM.

#include <stdio.h>
#include <stdlib.h>
#include <math.h>

__device__ float fminf_xorsign_abs (float a, float b)
{
    float r;
    asm ("min.xorsign.abs.f32 %0,%1,%2;\n\t" : "=f"(r) : "f"(a), "f"(b));
    return r;
}

__global__ void kernel (float a, float b)
{
    float sa = copysignf (1.0f, a);
    float sb = copysignf (1.0f, b);
    float ref = fminf (fabsf (a), fabsf (b)) * sa * sb;
    float res = fminf_xorsign_abs (a, b);

    printf ("fminf_xorsign_abs (%15.8e, %15.8e) = %15.8e  ref=%15.8e",
            a, b, res, ref);
}

int main (void)
{
    kernel<<<1,1>>>( 5.0f,  3.0f);
    kernel<<<1,1>>>( 5.0f, -3.0f);
    kernel<<<1,1>>>(-5.0f,  3.0f);
    kernel<<<1,1>>>(-5.0f, -3.0f);
    cudaDeviceSynchronize();
    return EXIT_SUCCESS;
}

The instruction as SASS level looks like this:

FMNMX.XORSIGN R2, |R2|, |c[0x0][0x160]|, PT ;