shared memory example to be found v.2 regarding example problem : scalarProd

syoon · September 20, 2010, 4:37pm

in the scalarPrduc example, more specifically in the scalarProd_kernel.cu file,

the original one uses a reduction method as shown in the following,

#define ACCUM_N 1024
…
for(int vvec = blockIdx.x; vec<VECTOR_N; vec += gridDim.x)
{

for(int stride = ACCUM_N / 2; stride > 0; stride >>= 1)
{
__syncthreads();
for(int iAccum = threadIdx.x; iAccum < stride; iAccum += blockDim.x)
accumResult[iAccum] += accumResult[stride + iAccum];
}
if(threadIdx.x == 0) d_C[vec] = accumResult[0];
}

my understanding of this part is that it sums up accumResult[0~1023] and save it to accumResult[0] and d_C[vec].

now, i am playing around with this code and i replace the summing part with the following;

float temp=0;
for ( int it=0;it<1024; ++it ) {
temp += accumResult[it];
__syncthreads(); }
d_C[vec] = temp;

bacause i am in situation where i can not make the shared memory size as power of 2 so i can not use the reduction method.

it seems like my logic on using shared memory is not right in the above edition.

i compare the results from the two routines and they are different and i assume i am not doing right.

any comments on summation part?

Many thanks in advance for your valuable comments and advice…

Topic		Replies	Views
Is it mandatory to use shared memory in the kernel CUDA Programming and Performance	9	4381	October 11, 2010
shared memory example to be found easy example for vector dot product CUDA Programming and Performance	18	3811	September 17, 2010
Reduction questions(newbie-ish) CUDA Programming and Performance	7	1908	January 14, 2009
problem with shared mamery CUDA Programming and Performance	4	3247	May 11, 2009
Reduce sum in shared memory using CUB CUDA Programming and Performance cuda , kernel , performance	8	830	October 3, 2024
[Newbie] Operations of type "+=" on the shared memory not working as expected CUDA Programming and Performance	2	2111	February 10, 2009
I want to calculate the sum of the 512 lines CUDA Programming and Performance	16	2243	January 4, 2013
Reduction done in shared memory CUDA Programming and Performance	4	4093	September 25, 2009
Efficient summing of a matrix CUDA Programming and Performance	1	3797	June 27, 2007
CUDA FORTRAN shared memory warp-level sum reduction Legacy PGI Compilers	1	3450	May 19, 2014

shared memory example to be found v.2 regarding example problem : scalarProd

Related topics