Why doesn't this code work for uneven block size

EDIT: I just realized I shouldn’t expect too much help with that question title… should’ve written it differently :(

Hi, I’ve programmed a kernel to perform convolution of a vector with a 2D Matrix, the problem is when I change the Matrix size to be uneven, for example 1024x512 (what I do is change BLOCKSX to 512/THREADSX) but I don’t get correct results which seems very strange since I get the correct results when processing ANY_SIZExANY_SIZE and changing threads and blocks accordingly.

The kernel code is the following:

#define THREADSX 16

#define THREADSY 16

#define BLOCKSX  (1024 / THREADSX)

#define BLOCKSY  (1024 / THREADSY)

#define FH 1

#define A 1

#define B 1

__global__ void kernel(float *result,,int N,int M)


    __shared__ float T[THREADSY][THREADSX];

    float topX;

    float i;

int tx = threadIdx.x;

    int ty = threadIdx.y;

    int u = threadIdx.x + blockDim.x * blockIdx.x;

    int v = threadIdx.y + blockDim.y * blockIdx.y;

T[ty][tx] = 0;

    //Manage border cases

    if (!(u<A || u+A>=M)){

        i = -A;

        topX = A;

    }else if (u<A){

        i = -u;     

        topX = A;


        i = -A;

        topX = M-1-u;


for (;i<=topX;++i){

        T[ty][tx] += tex2D(texMatrix,u+i,(float)v) * tex1Dfetch(texFilterR,A+i);


    *(result+__mul24(M,v)+u) = T[ty][tx];


which I call as following:

dim3 dimGrid(BLOCKSX,BLOCKSY,1);

    dim3 dimBlock(THREADSX,THREADSY,1);

    fujiRows <<<dimGrid,dimBlock>>> (result_device,matrix_device,height,width,filter_device,x);

Any idea what is going on?

Please disregard my question I realized I had a problem with the way I was initializing my textures.