changing value with shared memory size...

Hi to all,

I am trying to write a simple program for x-ray diffraction patterns calculation using an old formula, that says:

I[q] = sum[ from i = 0 to N ] sum[ from i = 0 to N ] { fi[ q ] fj [ q ] sin [ ( q * rij ) ] / ( q * rij ) }

I have implemented a kernel that calculated:

F[q] = sum[ from i = 0 to N ] sum[ from i = 0 to N ] { sin [ ( q * rij ) ] / ( q * rij ) }

where: q is non-negative number and rij is the distance between atom i and j. N is number of atoms.
So, here my kernel:

global void sinc (int q, float4 *a_d, unsigned long int N, float monoInt)
{

int Idx = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;

float4 bi = a_d[Idx];

float inte = 0.0f;

extern __shared__ float4 SharedPos[];

#pragma unroll 8
for (int j = 0; j < gridDim.x; j ++)
{
    SharedPos[threadIdx.x] = a_d[blockDim.x * j + threadIdx.x];

    __syncthreads ();
   
    float4 bj;
    bj = SharedPos[j];
   
    for (int i = 0; i < blockDim.x; i++)
    {

         float3 r = {bi.x - bj.x, bi.y - bj.y, bi.z - bj.z};
         float distSq = r.x * r.x + r.y * r.y + r.z * r.z;

         if (distSq != 0) {

             float dist = sqrt(distSq);
             float step = q * Qstep * dist;
               
             inte += sin(step) / step;
   
         } else {
   
             inte ++;
   
         }
           
         __syncthreads ();

            int Idx = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;
            monoInt[Idx] = inte;
       
        }

        __syncthreads ();
       
}

}

I launch the kernel in this way:

dim3 dimGrid(TilesNumber, 1);
dim3 dimBlock(ThNumber, 1, 1);
sinc <<< dimGrid, dimBlock >>> (q, (float4*) a_d, N, monoInt);
int kernelErrors;
kernelErrors = cudaThreadSynchronize();

if ( kernelErrors != 0 ) {

	printf(" KERNEL PANIC!!\n");

}

on host (I know that is not a good thing but I am new of CUDA… :rolleyes: ), then, I reduce the array “monoInt” simply in this way:

for (int k = 0; k < N; k ++){

       temp += Hm[k];

}

All fine for q -> 0, F[q] -> N * N; that is right… but for some “q” values F[q] becomes negative!! This must not happen!

A strange thing I have verified is that if I launch the kernel in this way:

sinc <<< dimGrid, dimBlock, ShMem >>> (q, (float4*) a_d, N, monoInt);

changing ShMem value result changes…

Another problem is that for N about 10**6 on gtx295 kernel crash…
Maybe problem is in:

inte += sin(step) / step;

? Too large?

Thanks if anyone of you will answer me!

regards,

luca

:wub:

SOLVED (I hope)… ^_^

It seems that problem is due to wrong allocated memory size…
Allocating the quantity:

blockDimension * sizeof(float4),

it seems to work…

Thanks to all!