Parallel Addition ? How can i serialize parts at kernel?

hi,

how can i make a parallel addition or a serial part at kernel?

// pseudocode
// int* a array of integer - every block are referenced to one field

__device mykernel(int* t)
{

//huge parallel tasks
//each thread calculates 1 value

__syncthreads;

// now… all thread calculations must be summed, but how?

// t[blockidx.x] = sum of all threads at block

// i think
// t[blockidx.x]+= value of thread does’t works (only sum of last thread are saved)

}

main()
{
//cudamalloc…
mykernel<<<5,8>>> (t);
}

i hope for help :)

greetings,
l.

You definitely don’t want to serialize this.

You need a parallel reduction, see the corresponding SDK sample.

http://developer.download.nvidia.com/compu…c/reduction.pdf

More recommended reading here: http://www.cs.cmu.edu/~blelloch/papers/Ble90.pdf

Christian

thanks for this papers - very interesting. i read this first.

if i had any problems - i ask again :)

thx!

how did you found this document?

do you have more papers like this ;)

greetings,

l.

If they weren’t available for free already, I would start selling them to you. ;)

Look here:

http://developer.download.nvidia.com/compu…Algorithms.html

Christian