CUDA Generating Combinations Mapping threads to combinations

I am trying to use CUDA to solve a problem that involves a very large number of combinations. I have an array of integers numbered 0-47 that I pass from the CPU to GPU.

On the GPU, from that array of 48 integers I want to select every possible combination of 5 unique numbers. Each thread on the GPU should map to one specific combination. That is 1,712,304 combinations (or 1,712,304 threads). The order of the numbers is irrelevant, so I do not want duplicates. In other words, if I use the combination 0 1 2 3 4 5, then 0 1 2 3 5 4 should not be used because we already hit that case (see http://en.wikipedia.org/wiki/Combination).

To avoid repetition, I have visualized the combinations as strings of increasing numbers. Every number in the string must be greater than the previous number. So, for example, here are example combinations from the beginning to the end:

0 1 2 3 4 5
0 1 2 3 4 6
0 1 2 3 4 7

0 1 2 3 4 47
0 1 2 3 5 6
0 1 2 3 5 7
0 1 2 3 5 8

0 1 2 3 5 47
0 1 2 3 6 7
0 1 2 3 6 8

43 44 45 46 45
43 44 45 46 46
43 44 45 46 47

I have a GeForce 8800, so I want to use the maximum 512 threads per block to maximize performance. This would equate to roughly 3,344 blocks.

This is my first CUDA program. I know that every thread has a unique block ID and unique thread ID. The question I have is, how can I use the block and thread ID for a given thread to uniquely identify the specific combination that thread is supposed to use? In other words if a given thread is using combination 4 5 32 41 48, then no other thread should be using that combination and every single combination should be being used by a thread.

I have spent all night trying to tackle this problem. I came up with one solution, but I do not believe it is optimal. Any ideas? :)

it’s easy

pre-compute this sequence of all random combinations without permutations sequentially on CPU, download all this sequence to GPU (it will take only a few megabytes).

then run 1 712 304 threads and each thread will take it’s combination from memory for processing.

If you want your code to be generic, precompute indexes combinations rather than integer values itself (suppose you have 48 integers, they can be put to array), the large array would contain non-repeating numbers from 1 to 48 that indicate which integer to take (ensure you array of 48 integers only has unique values)

Lottery simulation? :)

http://en.wikipedia.org/wiki/Combinadic should have what you need

http://msdn.microsoft.com/en-us/magazine/cc163957.aspx may offer some insight as well.