Hi to all,
I am trying to write a simple program for x-ray diffraction patterns calculation using an old formula, that says:
I[q] = sum[ from i = 0 to N ] sum[ from i = 0 to N ] { fi[ q ] fj [ q ] sin [ ( q * rij ) ] / ( q * rij ) }
I have implemented a kernel that calculated:
F[q] = sum[ from i = 0 to N ] sum[ from i = 0 to N ] { sin [ ( q * rij ) ] / ( q * rij ) }
where: q is non-negative number and rij is the distance between atom i and j. N is number of atoms.
So, here my kernel:
global void sinc (int q, float4 *a_d, unsigned long int N, float monoInt)
{
int Idx = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;
float4 bi = a_d[Idx];
float inte = 0.0f;
extern __shared__ float4 SharedPos[];
#pragma unroll 8
for (int j = 0; j < gridDim.x; j ++)
{
SharedPos[threadIdx.x] = a_d[blockDim.x * j + threadIdx.x];
__syncthreads ();
float4 bj;
bj = SharedPos[j];
for (int i = 0; i < blockDim.x; i++)
{
float3 r = {bi.x - bj.x, bi.y - bj.y, bi.z - bj.z};
float distSq = r.x * r.x + r.y * r.y + r.z * r.z;
if (distSq != 0) {
float dist = sqrt(distSq);
float step = q * Qstep * dist;
inte += sin(step) / step;
} else {
inte ++;
}
__syncthreads ();
int Idx = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;
monoInt[Idx] = inte;
}
__syncthreads ();
}
}
I launch the kernel in this way:
dim3 dimGrid(TilesNumber, 1);
dim3 dimBlock(ThNumber, 1, 1);
sinc <<< dimGrid, dimBlock >>> (q, (float4*) a_d, N, monoInt);
int kernelErrors;
kernelErrors = cudaThreadSynchronize();
if ( kernelErrors != 0 ) {
printf(" KERNEL PANIC!!\n");
}
on host (I know that is not a good thing but I am new of CUDA… :rolleyes: ), then, I reduce the array “monoInt” simply in this way:
for (int k = 0; k < N; k ++){
temp += Hm[k];
}
All fine for q → 0, F[q] → N * N; that is right… but for some “q” values F[q] becomes negative!! This must not happen!
A strange thing I have verified is that if I launch the kernel in this way:
sinc <<< dimGrid, dimBlock, ShMem >>> (q, (float4*) a_d, N, monoInt);
changing ShMem value result changes…
Another problem is that for N about 10**6 on gtx295 kernel crash…
Maybe problem is in:
inte += sin(step) / step;
? Too large?
Thanks if anyone of you will answer me!
regards,
luca
:wub: