# Why doesn't work this __global__ function with grid dim of 2D?

Hi people, my code is:

``````__global__ void sumVect(QVECT *Vett, QVECT *VettRis, unsigned long int N, unsigned long int N2){

int tid, sh, linIdx, i, z, j;

tid = blockDim.x*blockDim.y*blockIdx.x + gridDim.x* blockIdx.y + sh;

if (tid<N2){

linIdx=N2-tid;

i=int(N - 0.5 - sqrt(0.25 - 2 * (1 - linIdx)));

z=(N+N-1-i)*i;

j=tid - z/2 + 1 + i;

if (i==j){

i=i-1;

j=N-1;

}

VettRis[tid].x=Vett[i].x+Vett[j].x;

VettRis[tid].y=Vett[i].y+Vett[j].y;

VettRis[tid].z=Vett[i].z+Vett[j].z;

VettRis[tid].Ene=Vett[i].Ene+Vett[j].Ene;

}
``````

Practically this kernel function sums N numbers in parallel in pairs without repetition. Now, this code works when launch kernel with grid dimension of 1D (example: kernel<<<dim3(65535,1,1),…>>>(…); ). On the contrary, this code doesn’t work when launch kernel with grid dimension of 2D (example: kernel<<<dim3(65535,1000,1),…>>>(…); ). The problem is that it doesn’t sum the last elements of the vectors, with all that the number of threads is sufficient to perform the calculation. Thanks a lot!

Is your tid created properly? I’m a bit confused how you are create the index.

Said that, the usual way to create an index from a 2D (x,y) array to an unique 1D index (in row-major order) in CUDA is:

``````// map from threadIdx/BlockIdx to (x,y) position

int x = threadIdx.x + blockIdx.x * blockDim.x;

int y = threadIdx.y + blockIdx.y * blockDim.y;

// calculate the offset of an element into the input array (row-major order)

int gid = x + y * NUMCOLS;
``````

Hope this help.

I haven’t a matrix but a vector and I sum the elements in this vector without repetitions. Example:

VettRis=Vett+Vett;

VettRis=Vett+Vett;

VettRis=Vett+Vett;

VettRis[i]=Vett+Vett[N-1];

VettRis[i+1]=Vett+Vett;

VettRis[i+2]=Vett+Vett;

VettRis[j]=Vett+Vett[N-1];

VettRis[j+1]=Vett+Vett;

VettRis[j+2]=Vett+Vett;

ect.

for grid dim 2D intend when launch the kernel with the configuration: kernel<<<dim3(x,y,1),…>>>(…);

I hope to have been clearer.

pQB’s comment is correct, your formula for [font=“Courier New”]tid[/font] is wrong. You probably meant to write

``````sh=threadIdx.x+blockDim.x*threadIdx.y;

tid = blockDim.x*blockDim.y * (blockIdx.x + gridDim.x* blockIdx.y) + sh;
``````

even though that would be an unusual mapping.

And there is no point in using a 2D block when you only ever used the flattened 1D [font=“Courier New”]tid[/font]. You can just as well use a 1D block from the beginning.

Thanks a lot tera, the problem were the brackets! THANK YOU, THANK YOU, THANK YOU!!! :D