Cuda generating combinations.

Matchstix · November 7, 2007, 10:04am

Hi,

Could CUDA be capable of calculating every combination of a set of numbers that add up to a given number then performing a calculation on each one? I.e:

Target num: 20
Num Buckets: 5

0, 0, 0, 0, 20
0, 0, 0, 1, 19
0, 0, 0, 2, 18
etc… up to:
19, 1, 0, 0, 0
20, 0, 0, 0, 0

I’m imaginging each thread somehow generates a row, but I’m unsure how I could generate each row in parallel?

Thanks alot.

MisterAnderson42 · November 7, 2007, 2:19pm

Well, there is the brute force method. What you have there is a 5-dimensional space with coordinates ranging from 0 to 20. With a little imagination, you should be able to figure out a way to map each thread to a single point in that space. Each thread would essentially “check” to see if the sum totals the proper value and decides a 1 or a 0 and writes that out to global memory. If you need a list of JUST the rows that match sequentially, you can perform a stream compaction on that.

I just woke up, so I don’t currently see any options other than brute force. How would you do this efficiently on the CPU? That sometimes gives insights on what to do on the GPU, though sometimes the mapping to make it data-parallel is challenging.

wildcat4096 · November 7, 2007, 3:17pm

Can you find a way to enumerate elements of your set? If possible you may be able to then map from an index value to the matching row configuration.

MisterAnderson42 · November 7, 2007, 4:14pm

Perhaps it might be more efficient to have a single thread handle a single point in N-1 dimensional space. It could then recompute the sum of those values and find with the value of the Nth dimension is, since only one (if it exists) can actually complete the sum to the desired target.

This smells of a dynamic programming problem, though I don’t see the solution (I think that is the right term, the one that uses a table to turn calculating binomial coefficients from O(2^N) to O(N^2)). Since that is an iterative process, it probably will not map well to the GPU.

dahu · November 7, 2007, 5:21pm

Why not something like :

n0 = 20
r0 = tid % n0
t1 = tid/n0

n1 = 20 - r0
r1 = t1%n1
t2 = t1/n1

n2 = 20-r0-r1
r2 = t2%n2
t3 = t2/n2

n3 = 20-r0-r1-r2
r3 = t3%n3

r4 = 20-r0-r1-r2-r3

r0…r4 are the numbers you are looking for.

dahu · November 8, 2007, 8:28am

Ooops, I think I made some errors :

first replace 20 by 21 to get the ri in the range 0…20

And even then , it does not work : you have duplicates (take tid = 20 and tid = 41, you get 20,0,0,0,0 both times.

The answer is more complex :

let P(i, j) be the number of permutation of i elements in j bins (you are looking for all the P(21, 5) permutations in your example)

let S(k, j) be the sum P(0, j)+P(1,j) +… + P(k, j) (and with S(-1, j) = 0 by convention)

r0 is the number that verifies S(r0-1, 5) <= tid < S(r0, 5)

then let t1 = tid-S(r0-1, 5) , and choose r1 : the number that verifies

S(r1-1, 4) <= t1 < S(r1, 4)

and so on with t2 = t1-S(r1-1, 4) …

I think this should work this time, correct me if I am still wrong.

PS : to fasten your computation, you can precompute all the P(i, j) and S(i, j), you have 5*21 = 105 of them.

Matchstix · November 8, 2007, 11:12am

Hmm, many interesting ideas… I wasn’t sure if it was doable, but perhaps working on a few of these ideas will help. Thanks for your time. :)

Topic		Replies	Views
CUDA Generating Combinations Mapping threads to combinations CUDA Programming and Performance	2	3376	June 20, 2008
Algorithm query... CUDA Programming and Performance	3	453	March 17, 2011
Can CUDA do sequential processing? CUDA Programming and Performance	7	6593	August 24, 2011
CUDA Matrix Multiplication: One thread computes multiple elements CUDA Programming and Performance	4	4985	December 28, 2014
Why is this problem not well suited to GPU compute? Brute-forcing chess magic CUDA Programming and Performance cuda	9	74	January 7, 2025
How to parallel a seirial code CUDA Programming and Performance	4	728	March 16, 2018
CUDA - calculation of a sum CUDA Programming and Performance	7	5536	April 30, 2010
Can I Control Thread ID? CUDA Programming and Performance	3	4361	June 9, 2008
scatter and gather with CUDA? CUDA Programming and Performance	3	10102	March 9, 2009
CUDA Program Issue CUDA Programming and Performance cuda	19	151	September 20, 2024

Cuda generating combinations.

Related topics