changing value with shared memory size...

CaLu · June 11, 2009, 4:18pm

Hi to all,

I am trying to write a simple program for x-ray diffraction patterns calculation using an old formula, that says:

I[q] = sum[ from i = 0 to N ] sum[ from i = 0 to N ] { fi[ q ] fj [ q ] sin [ ( q * rij ) ] / ( q * rij ) }

I have implemented a kernel that calculated:

F[q] = sum[ from i = 0 to N ] sum[ from i = 0 to N ] { sin [ ( q * rij ) ] / ( q * rij ) }

where: q is non-negative number and rij is the distance between atom i and j. N is number of atoms.
So, here my kernel:

global void sinc (int q, float4 *a_d, unsigned long int N, float monoInt)
{

int Idx = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;

float4 bi = a_d[Idx];

float inte = 0.0f;

extern __shared__ float4 SharedPos[];

#pragma unroll 8
for (int j = 0; j < gridDim.x; j ++)
{
    SharedPos[threadIdx.x] = a_d[blockDim.x * j + threadIdx.x];

    __syncthreads ();
   
    float4 bj;
    bj = SharedPos[j];
   
    for (int i = 0; i < blockDim.x; i++)
    {

         float3 r = {bi.x - bj.x, bi.y - bj.y, bi.z - bj.z};
         float distSq = r.x * r.x + r.y * r.y + r.z * r.z;

         if (distSq != 0) {

             float dist = sqrt(distSq);
             float step = q * Qstep * dist;
               
             inte += sin(step) / step;
   
         } else {
   
             inte ++;
   
         }
           
         __syncthreads ();

            int Idx = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;
            monoInt[Idx] = inte;
       
        }

        __syncthreads ();
       
}

}

I launch the kernel in this way:

dim3 dimGrid(TilesNumber, 1);
dim3 dimBlock(ThNumber, 1, 1);
sinc <<< dimGrid, dimBlock >>> (q, (float4*) a_d, N, monoInt);
int kernelErrors;
kernelErrors = cudaThreadSynchronize();

if ( kernelErrors != 0 ) {

	printf(" KERNEL PANIC!!\n");

}

on host (I know that is not a good thing but I am new of CUDA… :rolleyes: ), then, I reduce the array “monoInt” simply in this way:

for (int k = 0; k < N; k ++){

       temp += Hm[k];

}

All fine for q → 0, F[q] → N * N; that is right… but for some “q” values F[q] becomes negative!! This must not happen!

A strange thing I have verified is that if I launch the kernel in this way:

sinc <<< dimGrid, dimBlock, ShMem >>> (q, (float4*) a_d, N, monoInt);

changing ShMem value result changes…

Another problem is that for N about 10**6 on gtx295 kernel crash…
Maybe problem is in:

inte += sin(step) / step;

? Too large?

Thanks if anyone of you will answer me!

regards,

luca

:wub:

CaLu · June 14, 2009, 9:09am

SOLVED (I hope)… ^_^

It seems that problem is due to wrong allocated memory size…
Allocating the quantity:

blockDimension * sizeof(float4),

it seems to work…

Thanks to all!

Topic		Replies	Views
values depending on shared memory size CUDA Programming and Performance	6	2113	June 14, 2009
help getting shared memory working CUDA Programming and Performance	11	4403	June 12, 2007
shared memory wrong allocation? CUDA Programming and Performance	2	880	July 29, 2009
error in the result of using shared memory CUDA Programming and Performance	2	617	May 29, 2015
Ok, what am I doing wrong here? Some kind of overwrite? CUDA Programming and Performance	32	14833	February 7, 2008
shared memory problems CUDA Programming and Performance	12	5150	October 2, 2010
Take Garbage Value wrong output how to use shared memory in a program CUDA Programming and Performance	2	5036	December 23, 2009
Problem with dynamically allocated shared memory CUDA Programming and Performance	3	2760	July 11, 2008
shared memory in cuda fortran CUDA Programming and Performance	0	726	August 29, 2016
What's the problem in my code? CUDA Programming and Performance	2	538	August 24, 2016

changing value with shared memory size...

Related topics