values depending on shared memory size

CaLu · June 11, 2009, 12:49pm

Hi to all,

I am writing a small program to calculate this function:

f[q] = sum[ i from 0 to N] sum[ j from 0 to N ] { sin( q * rij ) / ( q * rij ) }

where q is a non negative number and rij distance between atom i and j; N is number of atoms.

f[q] has to be positive for every “q”.

But something goes wrong!

This is my kernel:

[codebox]{

int Idx = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;

float4 bi = a_d[Idx];

extern __shared__ float4 SharedPos[];

float inte = 0.0f;	



#pragma unroll 8

for (int j = 0; j < gridDim.x; j ++)

{

	// load body positions into shared memory

	SharedPos[threadIdx.x] = a_d[blockDim.x * j + threadIdx.x];

	// wait until all the positions are loaded

	__syncthreads ();

	

	float4 bj;

	bj = SharedPos[j];

	

	if ((bi.w > 0.0f) && (bj.w > 0.0f))

	{

		

	for (int i = 0; i < blockDim.x; i++)

	{

		float3 r = {bi.x - bj.x, bi.y - bj.y, bi.z - bj.z};

		float distSq = r.x * r.x + r.y * r.y + r.z * r.z;

		if (distSq != 0) {

			float dist = sqrt(distSq);

			float step = q * Qstep * dist;

			inte += sin(step) / step;

			__syncthreads ();

	

		} else {

	

			inte ++;

	

		}



		// let threads write an array in order to keep all results

		int Idx = __mul24(blockIdx.x, blockDim.x) + threadIdx.x;

		monoInt[Idx] = inte;

	

	}

	

	// wait till threads have finished calculating before writing new position

	__syncthreads ();

		

	}

}

}[/codebox]

I store every thread result in “monoInt” array and then i reduce it on the host (so a “monoInt” array for every q) because till now i am not able to reduce it on device… :rolleyes:

In this way i obtain f[q], for a given “q”, as:

[codebox] for (int k = 0; k < N; k ++){

			f[q] += Hm[k];

		}

[/codebox]

The strange (to me) thing is that value changes with “ShMem” size when i launch kernel in this way:

[codebox] dim3 dimGrid(TilesNumber, 1); // number of blocks in grid

		dim3 dimBlock(ThNumber, 1, 1);		// number of threads in each block

		unsigned int ShMem = 16320;

		sinc <<< dimGrid, dimBlock, ShMem >>> (q, (float4*) a_d, N, intensity, monoInt, TileModule, step);[/codebox]

Some ideas?? Thanks to anyone trying to give me some suggest! :">

seibert · June 11, 2009, 12:53pm

The strange (to me) thing is that value changes with “ShMem” size when i launch kernel in this way:

[codebox] dim3 dimGrid(TilesNumber, 1); // number of blocks in grid
		dim3 dimBlock(ThNumber, 1, 1);		// number of threads in each block

		unsigned int ShMem = 16320;

		sinc <<< dimGrid, dimBlock, ShMem >>> (q, (float4*) a_d, N, intensity, monoInt, TileModule, step);[/codebox]
Some ideas?? Thanks to anyone trying to give me some suggest! :">

The values change because sinc<<<>>> is possibly returning an error. The size of shared memory is 16384 bytes, but you can’t allocate all of it because it is used for passing parameters and other values. Check for errors after this call to verify this. (An easy way to do this is to call cudaThreadSynchronize() after your kernel and check the return value from that.)

eyalhir74 · June 11, 2009, 12:59pm

[quote name=‘CaLu@debian’ date=‘Jun 11 2009, 03:49 PM’ post=‘551350’]

[codebox]{

		if (distSq != 0) {

			float dist = sqrt(distSq);

			float step = q * Qstep * dist;

			inte += sin(step) / step;

			__syncthreads ();

	

		} else {

	

			inte ++;

	

		}

}[/codebox]

Please follow what seibert wrote.

Also, are you running this on emulation mode? that __syncthreads() inside a if should have dead-locked your application

as far as I rememeber. You cant do it inside a if statement…

eyal

CaLu · June 11, 2009, 1:46pm

Hi seibert and eyalhir74, thank you for your replays.

No, I am not running my program in emu mode.

I have tried to write, as suggested by you:

[codebox] sinc <<< dimGrid, dimBlock, ShMem >>> (q, (float4*) a_d, N, intensity, monoInt, TileModule, step);

		int cts;

		cts = cudaThreadSynchronize();

		printf("cts %d", cts);[/codebox]

result is “cts 0”.

But I don’t know what does it mean… All fine?

I have also tried let nvcc decide ShMem size (<<< dimGrid, dimBlock >>> ). But negative values appear for some “q” values.

Problem can be in this kernel operation?

[codebox]inte += sin(step) / step;[/codebox]

Maybe too many computations?

Thanks to both of you!!

CaLu · June 11, 2009, 1:59pm

Also moved __syncthreads() outside of “if” statement, but nothing changes.

CaLu · June 11, 2009, 3:29pm

forgot to mention…

for very small value of q, results are corrects… for q → 0 f[q] have to result N * N, that is correct… strange behaviour only for some points… External Image

CaLu · June 14, 2009, 9:10am

SOLVED (I hope)… ^_^

It seems that problem is due to wrong allocated memory size…
Allocating the quantity:

blockDimension * sizeof(float4),

it seems to work…

Thanks to all!

Topic		Replies	Views
changing value with shared memory size... CUDA Programming and Performance	1	2601	June 14, 2009
help getting shared memory working CUDA Programming and Performance	11	4403	June 12, 2007
shared memory problems CUDA Programming and Performance	12	5150	October 2, 2010
shared memory wrong allocation? CUDA Programming and Performance	2	880	July 29, 2009
problem with shared mamery CUDA Programming and Performance	4	3238	May 11, 2009
Take Garbage Value wrong output how to use shared memory in a program CUDA Programming and Performance	2	5036	December 23, 2009
Some confusion on using shared memory. CUDA Programming and Performance	26	9376	June 2, 2009
shared memory issue CUDA Programming and Performance	2	3013	February 17, 2010
One thread write into shared memory has limit? CUDA Programming and Performance	4	684	July 28, 2022
Ok, what am I doing wrong here? Some kind of overwrite? CUDA Programming and Performance	32	14842	February 7, 2008

values depending on shared memory size

Related topics