EDIT: I just realized I shouldn’t expect too much help with that question title… should’ve written it differently :(
Hi, I’ve programmed a kernel to perform convolution of a vector with a 2D Matrix, the problem is when I change the Matrix size to be uneven, for example 1024x512 (what I do is change BLOCKSX to 512/THREADSX) but I don’t get correct results which seems very strange since I get the correct results when processing ANY_SIZExANY_SIZE and changing threads and blocks accordingly.
The kernel code is the following:
#define THREADSX 16
#define THREADSY 16
#define BLOCKSX (1024 / THREADSX)
#define BLOCKSY (1024 / THREADSY)
#define FH 1
#define A 1
#define B 1
__global__ void kernel(float *result,,int N,int M)
{
__shared__ float T[THREADSY][THREADSX];
float topX;
float i;
int tx = threadIdx.x;
int ty = threadIdx.y;
int u = threadIdx.x + blockDim.x * blockIdx.x;
int v = threadIdx.y + blockDim.y * blockIdx.y;
T[ty][tx] = 0;
//Manage border cases
if (!(u<A || u+A>=M)){
i = -A;
topX = A;
}else if (u<A){
i = -u;
topX = A;
}else{
i = -A;
topX = M-1-u;
}
for (;i<=topX;++i){
T[ty][tx] += tex2D(texMatrix,u+i,(float)v) * tex1Dfetch(texFilterR,A+i);
}
*(result+__mul24(M,v)+u) = T[ty][tx];
}
which I call as following:
dim3 dimGrid(BLOCKSX,BLOCKSY,1);
dim3 dimBlock(THREADSX,THREADSY,1);
fujiRows <<<dimGrid,dimBlock>>> (result_device,matrix_device,height,width,filter_device,x);
Any idea what is going on?