shared memory wrong allocation?

CaLu · July 29, 2009, 7:29am

Hi,

I am writing a kernel that basically evaluate interaction between all bodies (like in Mark Harris’ “Fast nBody Simulation”) and do something for every as calculated point. For every interaction kernel sums NN values for every parameter. In order to avoid NN local memories writings for every value of parameter (max value of this is some thousands) I have allocated an array in shared memory, in this way:

[codebox]

extern shared float4 sharedMemory;

float4 *sharedPosition = sharedMemory; // size: numberOfthreads times size of (float4)

float kernelIntensity = ( float ) ( sharedMemory ) + sharedMemoryIdx;

with sharedMemory size:

int ShMem = 2 * threadsNumber * sizeof( float4 );

and:

unsigned int sharedMemoryIdx = 4 * thNumber;

[/codebox]

where sharedPosiition is an array useful to store atomic positions (like in “Fast nBody Simulation”) and kernelIntensity an array in which sum N*N values in each position.I mean:

[codebox]

for i…

for j…

(first two loops for calculating body interaction)

for ( int q = qLim.x; q < qLim.y; q ++ )

 {

        kernelIntensity[ q + sharedMemoryIdx ] += f( qrij );

 }

[/codebox]

For room reasons, then, I have to partially reduce kernelIntensity on device before memcpy it to host (outside i and j loops), so I need kernelIntensity array.

Now, my problem is that, maybe because of float cast ( ? ) or wrong access index when I do calculation, I am not able to find a value for sharedMemoryIdx that gives me right results…

Any idea?

Please note that if kernelIntensity is declared in local memory results are right.

thank to anyone would give me a suggest,

cheers,

luca

CaLu · July 29, 2009, 10:17am

News:

results for some q points seems to be quite right, but with a lot of “noise” in the output. Maybe I am rewriting same data locations?

I have tried to add two __syncthreads, before writing and after, but nothing seems to change:

				for ( int q = qLim.x; q < qLim.y; q ++ )
				{
					qrij = ( q + qFirstPoint ) * stepDistance;
					__syncthreads();
					kernelIntensity[ q + sharedMemoryIdx ] += f( qrij );
					__syncthreads();
				}

Another strange thing is that using fast math results changes dramatically. Maybe I am exceeding with resources usage (now I am dealing with a Gtx260, cc 1.3)?

CaLu · July 29, 2009, 2:15pm

SOLVED ( I hope … )

Wrong indexing! :-D

bye!

Topic		Replies	Views
help getting shared memory working CUDA Programming and Performance	11	4311	June 12, 2007
strange error about shared memory CUDA Programming and Performance	4	2315	November 30, 2007
Problem with dynamically allocated shared memory CUDA Programming and Performance	3	2719	July 11, 2008
Shared Memory Again What is happening Here CUDA Programming and Performance	11	2185	June 5, 2009
Wrong indexing? CUDA Programming and Performance	4	1146	March 4, 2010
Problems doing shared memory test CUDA Programming and Performance	1	1530	February 17, 2008
shared memory problem usage in variables CUDA Programming and Performance	8	2469	September 22, 2010
how to use shared memory CUDA Programming and Performance	6	7711	September 5, 2010
shared problem CUDA Programming and Performance	2	1845	May 28, 2008
Take Garbage Value wrong output how to use shared memory in a program CUDA Programming and Performance	2	4999	December 23, 2009

shared memory wrong allocation?

Related topics