kernel indexing question

mercury_smoke · December 3, 2008, 5:12pm

Hello everybody, i’m working right now on a program that does color conversions form RGB space to CIE Lab space; the kernels performing the converion are

[codebox]global void RGBtoXYZ_GPU(float* d_r, float* d_g, float* d_b)

{

const unsigned int tid = blockIdx.x * blockDim.x + threadIdx.x;

float red   = d_r[tid];

float green = d_g[tid];

float blue  = d_b[tid];

red   = (red   > 0.04045f) ? pow((red + 0.055f)/1.055f,   2.4f) : red / 12.92f;

green = (green > 0.04045f) ? pow((green + 0.055f)/1.055f, 2.4f) : green / 12.92f;

blue  = (blue  > 0.04045f) ? pow((blue + 0.055f)/1.055f,  2.4f) : blue / 12.92f;

d_r[tid] = red * 0.436052025f + green * 0.385081593f + blue * 0.143087414f; //X

d_g[tid] = red * 0.222491598f + green * 0.71688606f + blue * 0.060621486f; //Y

d_b[tid] = red * 0.013929122f + green * 0.097097002f + blue * 0.71418547f; //Z

}

global void XYZtoLab_GPU(float* d_X, float* d_Y, float* d_Z)

{

const unsigned int tid = blockIdx.x * blockDim.x + threadIdx.x;

float fx, fy, fz;

//Reference white (D50)

float Xr = 0.964221f;

float Yr = 1.0f;

float Zr = 0.825211f;

float xr = d_X[tid]/Xr;

float yr = d_Y[tid]/Yr;

float zr = d_Z[tid]/Zr;

fx = (xr > 0.008856451f) ? pow(xr, 1.f/3.f) : (7.787f * xr) + 16.f / 116.f; 

fy = (yr > 0.008856451f) ? pow(yr, 1.f/3.f) : (7.787f * yr) + 16.f / 116.f;

fz = (zr > 0.008856451f) ? pow(zr, 1.f/3.f) : (7.787f * zr) + 16.f / 116.f;

d_X[tid] = (116.0f * fy) - 16; //L*

d_Y[tid] = 500 * (fx - fy);   //a

d_Z[tid] = 200 * (fy - fz);   //b

}[/codebox]

Then some normalization is done on the new color-values but that’s another story. The first version of the functions running these kernels was a test, and worked on a 512x512 rgb-image: the configuration for both kernels was a “linear” grid of 512 blocks and the block dimensions were 512x1x1 (<<< 512, 512 >>>). Obviously i’m now trying to extend the approach to bigger images and to map a pixel to a thread i need a different grid and block setup. The question is, how should i rearrange block and grid dimensions, and how does the thread in the kernel change? Let’s say i’m going to work on images much bigger than 512x512 in ppm format and i’m stuck using device emulation atm.

Please be patient since this is my first time programming in CUDA. Thank you very much!

A.

MisterAnderson42 · December 3, 2008, 5:30pm

You can run up to 65535 blocks just with a 1D grid, so you should be able to do images up to 5700x5700 (sqrt(65535*512)) without any changes to your current code. To go even larger, just use a 2D block which will only slightly increase the complexity of your indexing. You’ll probably want to add a “if (tid >= num_pixels) return;” so that you don’t have threads writing past the end of your allocated image.

mercury_smoke · December 3, 2008, 6:28pm

Thank you very much, i followed your suggestion and everything worked just fine… the average size of the images i use to process is 2000x2000 so the linear grid solution just suits my needs.

And sorry for the double post, i just didn’t know wich was the right section to post my question in, i will not made this mistake again.

thanks again

A.

Topic		Replies	Views
newbie - kernel indexing question CUDA Programming and Performance	1	1621	December 3, 2008
help with some cuda programming CUDA Programming and Performance	9	1817	August 31, 2009
Newbie help on thread blocks CUDA Programming and Performance	22	10599	December 24, 2008
Probably a simple answer Simple CUDA code - unexpected result CUDA Programming and Performance	7	4852	October 27, 2010
choosing the best grid/block dimensions CUDA Programming and Performance	3	1104	January 30, 2016
Grid dimensions CUDA Programming and Performance	6	5465	September 18, 2009
Kernel configuration and maximum array size problem. CUDA Programming and Performance	8	6473	January 25, 2009
hitting the grid size limitation CUDA Programming and Performance	5	1402	November 13, 2009
LARGE 2D arrays CUDA Programming and Performance	10	8555	August 11, 2011
Is this Correct? CUDA Programming and Performance	5	3039	May 21, 2009

kernel indexing question

Related topics