Hi to all,
I am writing a small program to calculate this function:
f[q] = sum[ i from 0 to N] sum[ j from 0 to N ] { sin( q * rij ) / ( q * rij ) }
where q is a non negative number and rij distance between atom i and j; N is number of atoms.
f[q] has to be positive for every “q”.
But something goes wrong!
This is my kernel:
[codebox]{
int Idx = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;
float4 bi = a_d[Idx];
extern __shared__ float4 SharedPos[];
float inte = 0.0f;
#pragma unroll 8
for (int j = 0; j < gridDim.x; j ++)
{
// load body positions into shared memory
SharedPos[threadIdx.x] = a_d[blockDim.x * j + threadIdx.x];
// wait until all the positions are loaded
__syncthreads ();
float4 bj;
bj = SharedPos[j];
if ((bi.w > 0.0f) && (bj.w > 0.0f))
{
for (int i = 0; i < blockDim.x; i++)
{
float3 r = {bi.x - bj.x, bi.y - bj.y, bi.z - bj.z};
float distSq = r.x * r.x + r.y * r.y + r.z * r.z;
if (distSq != 0) {
float dist = sqrt(distSq);
float step = q * Qstep * dist;
inte += sin(step) / step;
__syncthreads ();
} else {
inte ++;
}
// let threads write an array in order to keep all results
int Idx = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;
monoInt[Idx] = inte;
}
// wait till threads have finished calculating before writing new position
__syncthreads ();
}
}
}[/codebox]
I store every thread result in “monoInt” array and then i reduce it on the host (so a “monoInt” array for every q) because till now i am not able to reduce it on device… :rolleyes:
In this way i obtain f[q], for a given “q”, as:
[codebox] for (int k = 0; k < N; k ++){
f[q] += Hm[k];
}
[/codebox]
The strange (to me) thing is that value changes with “ShMem” size when i launch kernel in this way:
[codebox] dim3 dimGrid(TilesNumber, 1); // number of blocks in grid
dim3 dimBlock(ThNumber, 1, 1); // number of threads in each block
unsigned int ShMem = 16320;
sinc <<< dimGrid, dimBlock, ShMem >>> (q, (float4*) a_d, N, intensity, monoInt, TileModule, step);[/codebox]
Some ideas?? Thanks to anyone trying to give me some suggest! :">