 # changing value with shared memory size...

Hi to all,

I am trying to write a simple program for x-ray diffraction patterns calculation using an old formula, that says:

I[q] = sum[ from i = 0 to N ] sum[ from i = 0 to N ] { fi[ q ] fj [ q ] sin [ ( q * rij ) ] / ( q * rij ) }

I have implemented a kernel that calculated:

F[q] = sum[ from i = 0 to N ] sum[ from i = 0 to N ] { sin [ ( q * rij ) ] / ( q * rij ) }

where: q is non-negative number and rij is the distance between atom i and j. N is number of atoms.
So, here my kernel:

global void sinc (int q, float4 *a_d, unsigned long int N, float monoInt)
{

``````int Idx = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;

float4 bi = a_d[Idx];

float inte = 0.0f;

extern __shared__ float4 SharedPos[];

#pragma unroll 8
for (int j = 0; j < gridDim.x; j ++)
{

float4 bj;
bj = SharedPos[j];

for (int i = 0; i < blockDim.x; i++)
{

float3 r = {bi.x - bj.x, bi.y - bj.y, bi.z - bj.z};
float distSq = r.x * r.x + r.y * r.y + r.z * r.z;

if (distSq != 0) {

float dist = sqrt(distSq);
float step = q * Qstep * dist;

inte += sin(step) / step;

} else {

inte ++;

}

int Idx = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;
monoInt[Idx] = inte;

}

}
``````

}

I launch the kernel in this way:

dim3 dimGrid(TilesNumber, 1);
dim3 dimBlock(ThNumber, 1, 1);
sinc <<< dimGrid, dimBlock >>> (q, (float4*) a_d, N, monoInt);
int kernelErrors;

if ( kernelErrors != 0 ) {

``````	printf(" KERNEL PANIC!!\n");
``````

}

on host (I know that is not a good thing but I am new of CUDA… :rolleyes: ), then, I reduce the array “monoInt” simply in this way:

for (int k = 0; k < N; k ++){

``````       temp += Hm[k];
``````

}

All fine for q -> 0, F[q] -> N * N; that is right… but for some “q” values F[q] becomes negative!! This must not happen!

A strange thing I have verified is that if I launch the kernel in this way:

sinc <<< dimGrid, dimBlock, ShMem >>> (q, (float4*) a_d, N, monoInt);

changing ShMem value result changes…

Another problem is that for N about 10**6 on gtx295 kernel crash…
Maybe problem is in:

inte += sin(step) / step;

? Too large?

Thanks if anyone of you will answer me!

regards,

luca

:wub:

SOLVED (I hope)… ^_^

It seems that problem is due to wrong allocated memory size…
Allocating the quantity:

blockDimension * sizeof(float4),

it seems to work…

Thanks to all!