 # Problem with 2D Grid

Hi all,
i´m implementing the following code, which calculates the parameter d_epsilon 4 times (blockDim.y) and repeat the same calculation 200 times (gridDim.y).

``````__global__ void makeEpsilonKernel( float *d_f, float* d_phi, float *d_thickness,
float2 *d_nComplex, float2 *d_epsilon, int numData, int numLayer)
{
const float2 cImagOne = make_float2(0.0, 1.0);
const float pi = 4.0 * atan(1.0);
const float cFactor = float(299.789);

int ty, index, ind2, ind3, ind4;
float term1;
float2 term2;

index = (blockIdx.x * blockDim.x) + threadIdx.x;

if (index < numData)
{
ind2 =ty;
ind3 = index + (numData * ty);
ind4 = index + (numData * ty) + blockIdx.y*blockDim.y*numData;
term1 = (2.0 * pi *d_thickness[ind2]* cos(d_phi[ind3]))  / cFactor;
term2 = d_complex_skal_mult(term1*d_f[index] , cImagOne);
d_epsilon[ind4] = d_complex_exp(d_complex_mult(term2, d_nComplex[ind3]));

}

}
``````
``````extern "C" void makeEpsilon( int blockSize, float *d_f,
float *d_thickness, int numData, int numLayer)
{
d_nComplex, d_epsilon, numData, numLayer);
gpuErrchk( cudaPeekAtLastError() );
}
``````

The first calculation is correct. The next one give wrong results back.
This means:
blockIdx.y = 0 : correct results
blockIdx.y = 1 : wrong results
blockIdx.y = 2 : wrong results
blockIdx.y = 3 : wrong results
blockIdx.y = 4 : correct results

The same scheme is repeated for the 200 calculations. Every fourth results are correct.

can that be a synchronization error?
I’m grateful for any help. Many thanks in advance.

Sorry for my english :-)

The same scheme is repeated for the 200 calculations

__syncthreads() is used to synchronize all threads into a block, not to sync. every thread running into a kernel.
I don’t know if your error is a synchronization error but, that instruction doesn’t make sense in your kernel.

To me I would say the error likely related to your grid setup. I would try changing to one dimensional sets of blocks rather than trying to create a 2D block grid. At the very least it should help make the index calculations simpler and I cannot see anything obvious in your kernels which require a 2D block setup. Whenever I have something complicated like that I try to decide whether I can simplify it down to a nice 1 dimensional row of threads which can be grouped in a 1 dimensional row of blocks.