 # The method to call a kernel that no need for parallelization

my kernel program calculating the bounding rect of a gray image(the rect surround all the pixels whose gray level is 255),and then extent the rect for next calculation;because the bounding rect data is stored in device memory,so I write an another kernel to do the extent calculating,but the kernel is very simple and there is no need for any parallelization,so
I call the kernel with grid and block size are all 1,I’m not sure whether it is a good choice,so anyone can tell me other better method to execute a kernel that no need for parallelization?

Below is my code to do bounding rect calculation & rect extent calculation:

extern “C” global void get_mask_rect(const UCHAR alpha, int w,int h,int astep,int * LTRB)
{
int tx,ty;
if(isInImgRect(w,h,tx,ty))
{
if(alpha[astep
ty + tx] == 255)
{
atomicMin(LTRB,tx);
atomicMax(LTRB+2,tx);
atomicMin(LTRB+1,ty);
atomicMax(LTRB+3,ty);
}
}
}

extern “C” global void extent_roi(const int * LTRB,int w,int h,const int WSZ,int * roi)
{
int l = LTRB - WSZ;
int r = LTRB + WSZ;
int t = LTRB - WSZ;
int b = LTRB + WSZ;
int BW = ((r-l) + (b-t))/16;
l = max(0,l-BW);
r = min(w-1,r+BW);
t = max(0,t-BW);
b = min(h-1,b+BW);
roi = l;
roi = t;
roi = r-l+1;
roi = b-t+1;
}

// the code to call kernel