Non uniform Convolution

G’day

I am trying to write a program which will use my 8800 GTS to convolve a 2d image, as in the seperable example, but instead of using a fixed convolution kernel for every data element, each data element must have its own individual convolution kernel. I’m not sure how to make this edit, but i imagine it will involve storing the kernels as a large array of kernels on the GPU and referencing the specific kernel to each data element. The kernels are likely to be fixed throughout the simulation, so i was hoping to be able to make only a single write to the GPU (i believe texture memory is the correct place for this?). If someone could please give me a little help in how to create, populate and reference such memory when programming the kernel function on the GPU, it would be much appreciated.

Thanks so much for your time, i’m racking my brains on how to complete this, its the last step i have to complete

Jamie

What i would like to do was to write a much larger kernel to the created memory space (with the correctly allocated memory size) which will equate to a 17 float long kernel for each element (17x1000x1000). Firstly im not sure with where the kernel is currently stored (device constant memory) if that large an array will fit. I’m also still trying to work out how to change where each thread gets its kernel term, which i imagine will use the writePos to offset the reference.

I tried a very simple version, basically inputing a kernel twice the size of the normal kernel (34 instead of 17) and asking the convolution to use the second set of data instead of the first with a static offset of 17. Which worked quite well. Next I tried changing that static offset to an input integer offset:

// input offset is integer ref

template<int i> __device__ float convolutionRow(float *data, int ref){

    return

        data[KERNEL_RADIUS - i] * d_Kernel[i+ref]

        + convolutionRow<i - 1>(data);

}

template<> __device__ float convolutionRow<-1>(float *data){

    return 0;

}

template<int i> __device__ float convolutionColumn(float *data, int ref){

    return 

        data[(KERNEL_RADIUS - i) * COLUMN_TILE_W] * d_Kernel2[i+ref]

        + convolutionColumn<i - 1>(data);

}

template<> __device__ float convolutionColumn<-1>(float *data){

    return 0;

}

And

sum = convolutionRow<2 * KERNEL_RADIUS>(data + smemPos, 17);

But this would not compile, with an error indicating that the template was not correct. I’m not sure how to get past this step, but once i can input the offset i’m sure i can use this to select which convolution kernel to apply for each element.

Any help would be much appreciated

Thanks

Jamie