A "simple" question

ir3074 · October 30, 2007, 8:39pm

Hello all. i have some question on CUDA programming.

Suppose i have an array with 10k elements . i want to do the following expression as
a[3] = a[5] + a [4].
a[4] = a[6] + a [5].
a[5] = a[7] + a [6].
a[6] = a[8] + a [7].
.
.
.
a[9998] = a[10000] + a[9999]

how do i arrange the thread and block to compute the above expression in parallel way??

thx a lot since this concept is very important for me to do the work… :wacko:

paulius · October 30, 2007, 8:56pm

1 Partition the output among threadblocks.
2 Read all the data needed by the threadblock into shared memory
3 Each thread computes the sum (by fetching form shared memory) and saves to global memory

Make sure that 2 is coalesced, shouldn’t be difficult since your access is regular so every threadblock will need a contiguous region of memory. You may need to pad to ensure proper alignment.

Make sure that 3 saves directly to gmem. That way threads read from shared memory (unmodified input) and store to gmem. So, there will be no issues with threads reading updated data (incorrect behavior in your case) due to parallelism.

Paulius

ir3074 · October 30, 2007, 10:47pm

thank you very much for your help. i am trying to study your answer. By the way, i try to write the kernel program that is

global void compute(double Six)
{
int natom = 10000;
int aBegin = blockIdx.xBLOCK_SIZEnatom;
int aEnd = aBegin + natom -1;
int idx = blockIdx.xblockDim.x+threadIdx.x;
int astep = BLOCK_SIZE;

for(int i = aBegin; i <=aEnd; i+=astep)
{

__shared__double As[BLOCK_SIZE];

As[threadIdx.x] = Six[i+threadIdx.x];

__syncthreads();

for(int k = 0;k<BLOCK_SIZE;++k)
{
As[threadIdx.x] = As[threadIdx.x+2]+ As[threadIdx.x+1];

__syncthreads();
}
}

do this work?? please give me any comment :o

Topic		Replies	Views
sequential sum within a kernel. CUDA Programming and Performance	23	5255	September 8, 2008
Calculation sum of array parts have large prime number elements CUDA Programming and Performance	5	1916	December 23, 2009
Basic parallel programming need some help CUDA Programming and Performance	15	6486	June 15, 2011
Small interesting problem Performance issue concern CUDA Programming and Performance	2	1952	April 27, 2008
one dimensional circular spin program CUDA Programming and Performance	6	3197	October 16, 2007
CUDA - calculation of a sum CUDA Programming and Performance	7	5715	April 30, 2010
Sum of N numbers in parallel in pairs without repetition. CUDA Programming and Performance	23	2899	December 20, 2011
Accumulate value within block CUDA Programming and Performance	15	3365	October 16, 2010
Urgent help with threads please! CUDA Programming and Performance	21	11017	March 6, 2008
Simple problem - but how to do fast! Suggestions welcome CUDA Programming and Performance	1	455	February 9, 2012

A "simple" question

Related topics