How to access RGB separately in kernel function

Hello everyone,
I am trying to convert my RGB 24-bit color depth image(640x480)to YUV color format. I have divided my image in 16x16 blocks and read all values in an unsigned character pointer array (means unsigned char *data). Therefore, I have got all pixel values in a single array.

Kernel arguments are:
dim3 block(16,16); // 16x16 = 256
dim3 grid(hp->biHeight/16,hp->biWidth/16); // 40x30 = 1200 // 1200x256 = 307200
rgbToyuv<<<grid,block>>>(d_hp, d_data, height, width);

But how to access those inidividual R,G,B values of a pixel while converting to YUV?

Help please :)!!

What format is the image you are reading?
Depending on that you would use the corresponding library.

Btw you can use opencv to do this like below, unless of course you are trying to do your own implementation.

Show as your code you have until now. You in luck I happen to investigate this myself a while ago. I used 32 bit integers, and shift and ands and such to extract r,g,b,a data and pack it up again. This is standard RGB manipulation stuff though… just try it out in a cuda kernal and see how that goes for you.

I suppose the NVIDIA NPP library ( has routines for converting RGB → YUV which is the same as RGB → YCbCr. I would use these routines instead writing an own routine for that.

Thank you for replying :)…

The image I am processing is bmp image of RGB color format with color depth 24-bit. I know opencv provides a direct inbuilt function, but with help of file functions I am able to load whole image in CPU as well as pass it to GPU :).

Right now, my problem is solved. I am able to access each component(R,G,B) for RGB->YUV transformation and the total time required for transformation of 1 frame by my kernel is just now around 0.30ms :). As u said instead of 3-byte I am also of thinking of appending a extra byte to each pixel, that might give me more better performance :). For that, I think its possible to write a kernel function for that.

Actually this is my college project rather just the first step, so instead of using something inbuilt I am writing own code :).