NPP nv12->bgr incorrect coefficients


I am seeing a peculiar issue with nppiNV12ToBGR_709HDTV_8u_P2C3R.

I have created a bt709 test file using gstreamer:

gst-launch-1.0 videotestsrc num-buffers=100 ! “video/x-raw, height=1080, width=1920, colorimetry=(string)bt709” ! x264enc bitrate=10000000 ! qtmux ! filesink location=bt709.mp4

I then decoded the same file using the codec sdk and converted NV12->BGR using nppiNV12ToBGR_709HDTV_8u_P2C3R. This resulted in poor colour conversion. Dynamic range is lower than the original, ultimately the picture is dull.


I created a kernel which i believe uses the correct coefficients which produces much better colours and represents the original image.

    float r = y * 1.164384 + v * 1.792741 - 248.101004;
    float g = y * 1.164384 - u * 0.213249 - v * 0.532909 + 76.878085;
    float b = y * 1.164384 + u * 2.112402 - 289.017577;

Am i doing something wrong or is there an issue with npp’s color conversion?

Thanks in advance,

Can you confirm whether the input is treated as full or limited range ycbcr. Also that the output is full range BGR? It looks to me like nppiNV12ToBGR_709HDTV_8u_P2C3R and nppiNV12ToBGR_8u_P2C3R output limited range BGR [16-235].

Thanks again,

I can confirm that both nppiNV12ToBGR_709HDTV_8u_P2C3R and nppiNV12ToBGR_8u_P2C3R assume (wrongly IMHO) ycbcr input ranges 0…255.

If the inputs are truly bt601 or bt709, the output for black and white are as follows…

void test_719_to_bgr(const uint8_t* nv12_data){

constexpr size_t input_stride = 8;
constexpr size_t output_stride = 3;
uint8_t *cu_nv12;
uint8_t *cu_bgr;
cudaMalloc(&cu_nv12, 12);
cudaMalloc(&cu_bgr, 3);
cudaMemcpy(cu_nv12, nv12_data, 12, cudaMemcpyHostToDevice);
NppiSize roi{1,1};
Npp8u *yuv_ptrs = {cu_nv12, cu_nv12 + input_stride};
nppiNV12ToBGR_709HDTV_8u_P2C3R(yuv_ptrs, input_stride, cu_bgr, output_stride, roi);
std::array<uint8_t,3> result_bgr{};
cudaMemcpy(&result_bgr[0], cu_bgr, 3, cudaMemcpyDeviceToHost);
for (auto &val : result_bgr){
std::cout << std::to_string(val) << " ";
std::cout << “\n”;

constexpr uint8_t nv12_black = {0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0x80, 0x00, 0x00};
constexpr uint8_t nv12_white = {0xeb, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0x80, 0x00, 0x00};


Actual results
16 16 16
235 235 235

Expected results
0 0 0
255 255 255

you might wish to file a bug. The instructions are linked to a sticky post at the top of the CUDA programming forum

I have raised a bug. I think the best solution here is to either add some new functions for limited colour range, or, add a parameter for colour range. At the very least, document the full range assumption.