DXGI/NvEnc YUV444 10bit format compatability

Using DirectX 11, DXGI DDA and NvEnc. I have to support YUV420, YUV444, 8bit, and 10bit video streams. I am using D3D11Texture2Ds for input buffers. I have compute shaders to convert the DXGI DDA capture formats (DXGI_FORMAT_B8G8R8A8_UNORM for 8bit, DXGI_FORMAT_R16G16B16A16_FLOAT for 10bit) to all the YUV formats for NvEnc besides YUV444 10bit.

Using the builtin RGB formats within NvEnc doesnt seem to work, since they all convert to YUV420 internally (no YUV444) and give me no control over the chromaticity coordinates (BT.601, BT.709 for example).

The YUV444 10bit formats that DXGI supports are:

DXGI_FORMAT_Y410 (10bit packed, 2bit alpha, 32bit per pixel)
DXGI_FORMAT_Y416 (10bit unpacked as 16bit per channel, 16bit alpha, 64bit per pixel)

The YUV444 10bit formats that NvEnc supports are:

NV_ENC_BUFFER_FORMAT_YUV444_10BIT (10bit unpacked as 16bit per channel, no alpha, 48bit per pixel)

So here is the problem. There are no compatible formats here. NvEnc only has a 48bit per pixel (unpacked) format, while DXGI has a 32bit per pixel packed, and a 64bit (unpacked), but with the extra alpha channel. It seems my only option to convert between the two is a structured buffer (and do more manual work), or a DXGI_FORMAT_R16_FLOAT format and making the dimensions be 3x my frame dimensions. It would also require me to build another pipeline. I have 4 for each conversion already, since I have to write the mousepointer (color, masked color, or monochrome) into the image as well. And this would just make my system more obnoxious and bloated than it already is (maybe i need a redesign for this part).

Do I have a better option? Is there something I am missing about these formats? Id appreciate any incite into this. Thanks.

Either use P016/NV30 or float (Chrome uses float) as you said.

As the offical document said, the format of the desktop image is always [DXGI_FORMAT_B8G8R8A8_UNORM].(DXGI_FORMAT (dxgiformat.h) - Win32 apps | Microsoft Docs)

How to get 10 bit desktop image?

Hey, check out IDXGIOutput5::DuplicateOutput1 (dxgi1_5.h) - Win32 apps | Microsoft Docs

You have to Query the newer DuplicateOutput in order to get 10bit (R10G10B10A2).

I have had issues with 10bit though, windows is weird. I have seen the format oscillate between 10bit and 8 bit

DXGI_FORMAT_Y416 is also packed format.

Only formats that work out of the box are DXGI_FORMAT_P010 and NV_ENC_BUFFER_FORMAT_YUV420_10BIT.
and ARGB10

from what I understand,
making R16_FLOAT with 3x height is the only option here.

Y416 is what Intel calls PXXX, it may be slightly different in layout though.

Apple also uses Y416 for its Prores encoder and decoder.