Fast summation

wanderine · March 9, 2009, 8:20pm

I want to sum all the values in a volume of size 128 x 128 x 128, how do I do that in the most efficient way in CUDA?

I’ve tried to launch a thread for each (x,y) position that sums over 128 z-positions each but it becomes really slow since each thread has to read from the memory 128 times.

Any ideas how to speed this up?

MisterAnderson42 · March 9, 2009, 8:47pm

Just “forget” that it is a volume for that kernel and run 128128128 threads in a standard reduction pattern.

wanderine · March 9, 2009, 8:53pm

I’m using global memory so I do the indexing myself, (x + y * DATA_W + z * DATA_W * DATA_H), to pretend it is a volume. What do you mean by standard reduction pattern?

MisterAnderson42 · March 9, 2009, 9:40pm

There is a reduction sample in the CUDA SDK that efficiently adds all values in an array.

wanderine · March 10, 2009, 6:20am

Thank you External Media

Topic		Replies	Views
How to set the priority fro threads ? CUDA Programming and Performance	1	2572	February 23, 2009
CUDA mean. CUDA Programming and Performance	1	3240	April 7, 2009
scatter and gather with CUDA? CUDA Programming and Performance	3	9824	March 9, 2009
Summing matrix elements CUDA Programming and Performance	3	6921	July 4, 2011
Reduction done in shared memory CUDA Programming and Performance	4	4027	September 25, 2009
Simple Inefficient Parallel Addition CUDA Programming and Performance	5	3158	April 10, 2009
Efficient summing of a matrix CUDA Programming and Performance	1	3741	June 27, 2007
How to express this efficiently in CUDA? CUDA Programming and Performance	0	882	May 4, 2009
How aggregate series on Cuda? CUDA Programming and Performance	2	1443	April 2, 2010
Given we fully use the threads, should we have as many blocks as possible or should we let each thread to do as much work as possible? CUDA Programming and Performance	3	405	April 7, 2022

Fast summation

Related topics