Counting objects into an array

randlem · July 29, 2010, 2:06am

I’m trying to solve a problem using CUDA. I’ve got an algorithm that solves for a set of parameters and I need to increment certain values in an array based on those parameters. The arrays are conceptually 2D but organized in memory as 1D. Many of the threads will produce similar results and I cannot section the results the threads produce. Basically it’s a number of threads all needing to increment the same areas in memory.

I’ve tried two approaches using atomic operators provided in CUDA. The first approach, all threads attempt to write to the global array using atomic ops. The second approach only one thread per block writes all the data for the block. The second approach is marginally faster, but neither are very fast.

Is there an accepted solution to this problem? Can you think of a better way to do it?

Thanks!

seibert · July 29, 2010, 2:17am

How big is the array? A hardware solution to this problem might be getting a GTX 470 or GTX 480. Atomic operations are 20x faster thanks to the 768kB of L2 cache.

From the software side, this sounds like a form of the histogram problem, so searching on how efficient histogramming implementations are written for CUDA could be helpful.

jan.heckman · July 29, 2010, 2:40pm

I’m trying to solve a problem using CUDA. I’ve got an algorithm that solves for a set of parameters and I need to increment certain values in an array based on those parameters. The arrays are conceptually 2D but organized in memory as 1D. Many of the threads will produce similar results and I cannot section the results the threads produce. Basically it’s a number of threads all needing to increment the same areas in memory.

I’ve tried two approaches using atomic operators provided in CUDA. The first approach, all threads attempt to write to the global array using atomic ops. The second approach only one thread per block writes all the data for the block. The second approach is marginally faster, but neither are very fast.

Is there an accepted solution to this problem? Can you think of a better way to do it?

Thanks!

Have a look at http://developer.download.nvidia.com/compu…c/histogram.pdf and http://forums.nvidia.com/index.php?showtopic=66717.

randlem · July 29, 2010, 3:25pm

The arrays could be larger then 32MB. It’s basically a sparse histogram problem. 99% of the cells in the array will be zero while a few in very specific areas have high counts. I’m thinking some of the previous fast histogram methods are worth a try.

seibert · July 29, 2010, 3:59pm

Ah, in that case, I wonder if it would be effective to work in a few steps:

Read input elements and write bin numbers to an output array
Sort output array
Merge and count (not exactly sure how this will work)

Topic		Replies	Views
Array Comparision CUDA Programming and Performance	4	4204	May 31, 2009
Shared memory write conflicts Looking for a little help... CUDA Programming and Performance	5	4902	September 7, 2007
How to do Parallel Reduction of many unequally sized arrays in CUDA? CUDA Programming and Performance	1	13019	November 24, 2009
Just starting and with a question on a excercise CUDA Programming and Performance	6	1170	January 7, 2010
Writing results into global array for only some threads CUDA Programming and Performance	5	1685	April 6, 2009
parallel find find multiple items from a array CUDA Programming and Performance	4	4380	February 23, 2009
CUDA motivation for multi-dimensional kernel execution CUDA Programming and Performance	6	4113	December 8, 2013
CUDA and fixed-point comparaison on big array Is CUDA suitable for fixed-point comparaison? CUDA Programming and Performance	7	2488	May 9, 2011
Allocating multi-dimension array (An array of arrays of different lengths) CUDA Programming and Performance	10	1585	July 1, 2014
Multiplying two arrays CUDA Programming and Performance	6	5136	May 7, 2008

Counting objects into an array

Related topics