Can we use "AtomicAdd()" with GTX 8800? Any other option to do same thing...?

Neeraj_Kulkarni · January 2, 2008, 1:47am

sorry for not being clear, the reduction only works to the level of the block, meaning that if you have 1000 elements and your block size is 256 then buy the end of the first reduction you will still have 4 elements instead of one. if you have a much bigger number of elements you might need more kernel calls just of the reduction. What i did to minimize the overhead of launching multiple kernels (which can be significant) is run the first reduction with at the end of the execution of the running kernel. and the last reduction at the begging of the next kernel (notice for this you are doing the same calculation in each block, so that some extra calculations. but i found that its faster then launching another kernel ).

and if my data set is big i run some more reduction kernels in between.

;)

[snapback]293849[/snapback]

I have been working on similar problems for the last few months.

I have programmed a few “hacks” that work for atomic computations at block level.

However the streaming architecture of the GPU is not designed for such constructs leading to possible dead locks!

I have already posted ways of doing this and is achieved by spin - loops + global writes!

http://forums.nvidia.com/index.php?showtopic=44144

Read from Memory

Work in parallel

Reduce in parallel (for threads within a single MP)

Reduce serially using the modified programming constructs.

Reduction + Block level synchronization + Memory optimization = very high performance gains

In short,Use the constructs as tools of getting around the problem not as a concrete reference!

I hope this helps

Cheers,

Neeraj

Topic		Replies	Views
atomicAdd CUDA Programming and Performance	4	3426	September 9, 2008
Can we use "AtomicAdd()" with GTX 8800? Any other option to do same thing...? CUDA Programming and Performance	0	1168	December 7, 2007
atomic add CUDA Programming and Performance	4	4665	March 20, 2008
Can we use "AtomicAdd()" with GTX 8800? Any other option to do same thing...? CUDA Programming and Performance	1	2218	December 7, 2007
Using reduction instead of atomics? CUDA Programming and Performance	9	5876	March 9, 2015
Hybrid Atomic Reduction CUDA Programming and Performance	0	681	June 24, 2013
Can I avoid using AtomicAdd with this kernel ??? CUDA Programming and Performance	9	3583	January 26, 2015
Adding data from multiple threads CUDA Programming and Performance	3	3348	June 20, 2008
AtomicAdd algorithm CUDA Programming and Performance	7	3858	August 25, 2009
accumulating floats accross threads in a block is there and atomicAdd + sync for floats? CUDA Programming and Performance	1	2249	January 26, 2009

Can we use "AtomicAdd()" with GTX 8800? Any other option to do same thing...?

Related topics