CUDA RGBA to YUVA conversion (nppiRGBToYUV_8u_AC4R), Transparency is not preserved.

Dear All ,

I have observed that 4 channel 8 bit RGBA to YUVA conversion doesn’t keep ALPHA intact as expected.
I am using nppiRGBToYUV_8u_AC4R.
As I understand for RGBA input of RGBA= {0xff,0xff,0xff,0x00} YUVA should be {0xFF,0x80,0x80,00}
But in turn I receive alpha as 0xff always I.e. I get {0xFF,0x80,0x80,0xff}

Here is what I am attempting

// THIS is allocated in using CUDAHostMalloc(), size is 1920x1080*4 bytes as it is RGBA
pCudaMemSrcRGBA = {0xff,0xff,0xff,0x00,0xff,0xff,0xff,0x00 …};

// THIS is allocated in using CUDAHostMalloc(), size is 1920x10804 bytes as it is RGBA
unsigned char
pCudaMemYuva_Dst =NULL ;
cudaMalloc(&pCudaMemYuva_Dst,sizeof(unsigned char)19201080*4);

NppStatus uStatus0 = nppiRGBToYUV_8u_AC4R (pCudaMemSrcRGBA, 19204,pCudaMemYuva_Src, 19204, oSizeROI);

unsigned char* pYuvaHost_Src = (unsigned char*)malloc(sizeof(unsigned char)192010804);
cudaMemcpy(pYuvaHost_Src,pCudaMemYuva_Src,1920
1080*4, cudaMemcpyDefault);

pYuvaHost_Src contains {0xFF,0x80,0x80,0xff,0xFF,0x80,0x80,0xff…}

Please throw some light , what am I missing?

Regards,
Abhijit

You should probably provide a short complete code that demonstrates the issue.

Anyway it seems to work for me. Here is a fully worked test:

$ cat t683.cu
#include <nppi.h>
#include <stdio.h>


const int w=1920;
const int h=1080;
const int p=4;

int main(){

  Npp8u *pSrc, *pDst, *pHst;
  cudaMalloc(&pSrc, h*w*p);
  cudaMalloc(&pDst, h*w*p);
  pHst=(Npp8u *)malloc(h*w*p);

  NppiSize oSizeROI;
  oSizeROI.width=w;
  oSizeROI.height=h;

  for (int i =0; i < p*w; i++){
    pHst[i*p+0] = 0xff;
    pHst[i*p+1] = 0xff;
    pHst[i*p+2] = 0xff;
    pHst[i*p+3] = 0x00;}
  printf("before:\n");
  for (int i = 0; i < 8; i++)
    printf("%d\n", pHst[i]);
  cudaMemcpy(pSrc, pHst, h*w*p, cudaMemcpyHostToDevice);
  NppStatus res=nppiRGBToYUV_8u_AC4R (pSrc, w*p, pDst, w*p, oSizeROI);
  if (res != 0) {printf("oops %d\n", (int)res); return 1;}
  cudaMemcpy(pHst, pDst, h*w*p, cudaMemcpyDeviceToHost);
  printf("after:\n");
  for (int i = 0; i < 8; i++)
    printf("%d\n", pHst[i]);

  return 0;
}

$ nvcc -o t683 t683.cu -lnppi
$ cuda-memcheck ./t683
========= CUDA-MEMCHECK
before:
255
255
255
0
255
255
255
0
after:
255
128
128
0
255
128
128
0
========= ERROR SUMMARY: 0 errors
$

using CUDA 7.5

Hi txbob,

Thank you for quick response, but problem exists even in sample you shared.
Please dump last 8 bytes instead of 1st 8 post conversion.
Post 50% conversion, creeps in some issue with alpha value(s).

I have modified program slightly to convey issue

int main(int argc, char **argv)
{
Npp8u pSrc, pDst, pHst;
const int w=1920;
const int h=1080;
const int p=4;
cudaMalloc(&pSrc, h
w
p);
cudaMalloc(&pDst, h
wp);
pHst=(Npp8u )malloc(hw
p);

NppiSize oSizeROI;
oSizeROI.width=w;
oSizeROI.height=h;

for (int i =0; i < p*w; i++){
pHst[i*p+0] = 0xff;
pHst[i*p+1] = 0xff;
pHst[i*p+2] = 0xff;
pHst[i*p+3] = 0x00;}
printf("before:\n");
for (int i = 0; i < 8; i++)
printf("%d\n", pHst[i]);
cudaMemcpy(pSrc, pHst, h*w*p, cudaMemcpyHostToDevice);
NppStatus res=nppiRGBToYUV_8u_AC4R (pSrc, w*p, pDst, w*p, oSizeROI);
if (res != 0) {printf("oops %d\n", (int)res); return 1;}
memset(pHst,0,h*w*p);  // Just to be sure there is no wicked past lingering
cudaMemcpy(pHst, pDst, h*w*p, cudaMemcpyDeviceToHost);
printf("after:\n");
// Last 8 bytes
for (int i = ((w*h*p)-8); i < w*h*p; i++)
printf("%d\n", pHst[i]); 

//dumpYUVfromYUVAInterleaved(“RGBA_2_YUVA_YUVDUMP.yuv”,pHst,hwp);
/* for (int i = ((whp)-8); i < whp; i++)
printf(“%d\n”, pHst[i]);*/

return 0;

}

Regards,
Abhijit

Our developers had this to say:

“In general AC4R means that the DESTINATION alpha remains unmodified.”

I have verified in my corrected test from above that this is in fact the behavior.

This function should match the behavior of the intel IPP function ippiRGBToYUV_8u_AC4R

If you can provide a complete code that demonstrates a difference in behavior between nppiRGBToYUV_8u_AC4R and ippiRGBToYUV_8u_AC4R, we’ll take a look.

But at the moment, the behavior appears to be correct (although perhaps unintuitive).

Bob,

Thanks for connecting with DEV team for this issue.
Code I sent in my previous response will compile and show problem. ( Some minor modification in code shared by you)

Test case is actually very simple and behavior by API nppiRGBToYUV_8u_AC4R is incorrect.

For RGBA input of RGBA= {0xff,0xff,0xff,0x00} YUVA should be {0xFF,0x80,0x80,00} for all 1920x1080 samples in output
Instead we can easily spot wrong output when checked bottom up ( e.g. code I sent in previous response dumps last 8 bytes)

If possible please ask developer to look into this issue and provide update.

Regards,
Abhijit

They have already looked at it, and your interpretation is incorrect.

The destination alpha is left unchanged (i.e. the source alpha is not copied to the destination alpha). That is the intended behavior. Please read my previous response.

The function is intended to behave identically to ippiRGBToYUV_8u_AC4R from the Intel IPP toolkit/SDK.

If you think IPP behaves differently than I describe for that function, please provide your full code sample using IPP that demonstrates a different behavior than what I describe.