Launching kernels with „replaceable“ constants

Launching kernels with „replaceable“ constants

I am trying to program an convolution-filter, which I will use at several places in my project. But with different operator-masks, that will also have different sizes.

My first approach looks like the following code. NCC compiles it without error, but the kernel execution fails:

__constant__ float OperatorMask[25];

__global__ void ImageConvolution(uchar4* pResult, int rX, int rY, float OperatorMask[] )


  const int ix = blockDim.x * blockIdx.x + threadIdx.x;

  const int iy = blockDim.y * blockIdx.y + threadIdx.y;


  const float x = (float)ix + 0.5f;

  const float y = (float)iy + 0.5f;

 float4 tempResVal = {0, 0, 0, 0};

  int index = 0;


  for(float j = -rY; j <= rY; j++)


    for(float i = -rX; i <= rX; i++)


      float4 aktTexel = FromUchar4(tex2D(texRef_Image_1, x+i , y+j ));

      tempResVal.x += ( aktTexel.x * OperatorMask[index] );

      tempResVal.y += ( aktTexel.y * OperatorMask[index] );

      tempResVal.z += ( aktTexel.z * OperatorMask[index] );




  *(pResult + iy * 704 + ix) = FromFloat4(tempResVal);


float op[5][5] = 

  { { 0, 0, 1, 0, 0},

    { 0, 1, 2, 1, 0},

    { 1, 2, 6, 2, 1},

    { 0, 1, 2, 1, 0},

    { 0, 0, 1, 0, 0} };

cudaMemcpyToSymbol(OperatorMask, op, sizeof(op));


     (tempIamge_2_uchar4, 2, 2, OperatorMask);


Does anyone have an idea, how to create a kernel, where I can “insert” different OperatorMask’s without changing the code of the kernel? (e.g. operator-masks of the size 3x3 or 6x6)

Thanks for your help!

When you write

__constant__ float OperatorMask[25];

you reserve an array of 25 floats in device constant memory. Then you copy values to the memory with cudaMemcpyToSymbol().So far so good.

What makes no sense is passing OperatorMask to the kernel. The constant memory is only available on the GPU and it will work right away. You can (I believe you must) use OperatorMask without passing it to the kernel. This might cause your kernel execution failure.

As for the dynamic size of the mask: would it be possible to pass a value to the kernel (via parameter or constant memory) that defines the size of the mask? Depending on that value the kernel would expect different maks which would reside in constant memory.

I know that the kernel-execution-error comes fom using the constant “OperatorMask” as a parameter for the kernel. If I erase the parameter “float OperatorMask” in my example, it works well. ( about 6.7ms for a 704x576-pixel-image on a 8800GTS 640MB)

May be it will be the most easy way to use Ctrl+C … Ctrl+V and and make one kernel for each OperatorMask-Size. It is not beautyful, but it will work.

__constant__ float OperatorMask_1[ 9]; // 3x3

__constant__ float OperatorMask_2[25]; // 5x5

__constant__ float OperatorMask_3[49]; // 7x7

__constant__ float OperatorMask_3[81]; // 9x9