# Why doesn't this code work for uneven block size

EDIT: I just realized I shouldn’t expect too much help with that question title… should’ve written it differently :(

Hi, I’ve programmed a kernel to perform convolution of a vector with a 2D Matrix, the problem is when I change the Matrix size to be uneven, for example 1024x512 (what I do is change BLOCKSX to 512/THREADSX) but I don’t get correct results which seems very strange since I get the correct results when processing ANY_SIZExANY_SIZE and changing threads and blocks accordingly.

The kernel code is the following:

``````#define THREADSX 16

#define BLOCKSX  (1024 / THREADSX)

#define BLOCKSY  (1024 / THREADSY)

#define FH 1

#define A 1

#define B 1

__global__ void kernel(float *result,,int N,int M)

{

float topX;

float i;

int tx = threadIdx.x;

int ty = threadIdx.y;

int u = threadIdx.x + blockDim.x * blockIdx.x;

int v = threadIdx.y + blockDim.y * blockIdx.y;

T[ty][tx] = 0;

//Manage border cases

if (!(u<A || u+A>=M)){

i = -A;

topX = A;

}else if (u<A){

i = -u;

topX = A;

}else{

i = -A;

topX = M-1-u;

}

for (;i<=topX;++i){

T[ty][tx] += tex2D(texMatrix,u+i,(float)v) * tex1Dfetch(texFilterR,A+i);

}

*(result+__mul24(M,v)+u) = T[ty][tx];

}
``````

which I call as following:

``````dim3 dimGrid(BLOCKSX,BLOCKSY,1);