Hi to all,

I am trying to write a simple program for x-ray diffraction patterns calculation using an old formula, that says:

I[q] = sum[ from i = 0 to N ] sum[ from i = 0 to N ] { fi[ q ] fj [ q ] sin [ ( q * rij ) ] / ( q * rij ) }

I have implemented a kernel that calculated:

F[q] = sum[ from i = 0 to N ] sum[ from i = 0 to N ] { sin [ ( q * rij ) ] / ( q * rij ) }

where: q is non-negative number and rij is the distance between atom i and j. N is number of atoms.

So, here my kernel:

**global** void sinc (int q, float4 *a_d, unsigned long int N, float monoInt)

{

```
int Idx = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;
float4 bi = a_d[Idx];
float inte = 0.0f;
extern __shared__ float4 SharedPos[];
#pragma unroll 8
for (int j = 0; j < gridDim.x; j ++)
{
SharedPos[threadIdx.x] = a_d[blockDim.x * j + threadIdx.x];
__syncthreads ();
float4 bj;
bj = SharedPos[j];
for (int i = 0; i < blockDim.x; i++)
{
float3 r = {bi.x - bj.x, bi.y - bj.y, bi.z - bj.z};
float distSq = r.x * r.x + r.y * r.y + r.z * r.z;
if (distSq != 0) {
float dist = sqrt(distSq);
float step = q * Qstep * dist;
inte += sin(step) / step;
} else {
inte ++;
}
__syncthreads ();
int Idx = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;
monoInt[Idx] = inte;
}
__syncthreads ();
}
```

}

I launch the kernel in this way:

dim3 dimGrid(TilesNumber, 1);

dim3 dimBlock(ThNumber, 1, 1);

sinc <<< dimGrid, dimBlock >>> (q, (float4*) a_d, N, monoInt);

int kernelErrors;

kernelErrors = cudaThreadSynchronize();

if ( kernelErrors != 0 ) {

```
printf(" KERNEL PANIC!!\n");
```

}

on host (I know that is not a good thing but I am new of CUDA… :rolleyes: ), then, I reduce the array “monoInt” simply in this way:

for (int k = 0; k < N; k ++){

```
temp += Hm[k];
```

}

All fine for q -> 0, F[q] -> N * N; that is right… but for some “q” values F[q] becomes negative!! This must not happen!

A strange thing I have verified is that if I launch the kernel in this way:

sinc <<< dimGrid, dimBlock, ShMem >>> (q, (float4*) a_d, N, monoInt);

changing ShMem value result changes…

Another problem is that for N about 10**6 on gtx295 kernel crash…

Maybe problem is in:

inte += sin(step) / step;

? Too large?

Thanks if anyone of you will answer me!

regards,

luca

:wub: