Hi,

I am a newbie to OpenCL. I have been tasked to do some image processing stuff.

Anyways, I am passing 2 sets of YUV data (left and right images of size 320 x 168) to a kernel function, which will compute the gradient of each pixel using SAD (Sum of absolute differences). For my first output, I only use 1 set of YUV data and it works nicely with the values same as computed by the CPU.

However, when I try to add a 2nd set of YUV data to the kernel function and compute for the 2nd output array, nothing seems to work (on the 2nd output array). I tried hard coding all members to 5 but the output array still shows values of its own.

Here is the kernel function implementation (output arrays are the first 2 parameters):

const char grad_l_h_cl = "

__kernel void grad_l_h

(

__global unsigned char* img_grad_left_hor

, __global unsigned char* img_grad_right_hor

, __global unsigned char* p1_y

, __global unsigned char* p1_u

, __global unsigned char* p1_v

, __global unsigned char* p2_y

, __global unsigned char* p2_u

, __global unsigned char* p2_v

, int width

, int height

)

{

const uint index = get_global_id(0);

unsigned char diff_y = p2_y[index]-p2_y[index+1], diff_u = 0, diff_v = 0;

if (index % width == width - 1){

img_grad_left_hor[index] = abs_diff(p1_y[index-1],p1_y[index]) + abs_diff(p1_u[index-1],p1_u[index])+ abs_diff(p1_v[index-1],p1_v[index]);

img_grad_right_hor[index] = abs_diff(p2_y[index-1],p2_y[index]) + abs_diff(p2_u[index-1],p2_u[index])+ abs_diff(p2_v[index-1],p2_v[index]);

} else {

img_grad_left_hor[index] = abs_diff(p1_y[index],p1_y[index+1]) + abs_diff(p1_u[index],p1_u[index+1])+ abs_diff(p1_v[index],p1_v[index+1]);

img_grad_right_hor[index] = abs_diff(p2_y[index],p2_y[index+1]) + abs_diff(p2_u[index],p2_u[index+1])+ abs_diff(p2_v[index],p2_v[index+1]);

}

if (img_grad_left_hor[index] == 0) {

img_grad_left_hor[index] = 0;

}

if (img_grad_left_hor[index] > 255) {

img_grad_left_hor[index] = 255;

}

}

";

Here is how I perform the operation (g_worksize = 8 x 320 x 168, l_worksize = 256):

error=clEnqueueNDRangeKernel(cq, k_cfg, 1, NULL, &g_worksize, &l_worksize, 0, NULL, NULL);

I have created input buffers as such (work = 320 x 168):

memp1_u=clCreateBuffer(context, CL_MEM_READ_ONLY, worksize, NULL, &error);

I suspect some memory settings are required to hold the data in the 2nd output array but I have no idea how to do so.

Please kindly help or advise.

*My system is:

GT220 - 6 multiprocessors, 48 CUDA cores, Compute Capability 1.2

GPU Computing SDK 3.2

WinXP Pro