k-out-of-n problem in CUDA


I’m right beginner in CUDA Programming. I want to make a Texas Hold’em Poker program in which I want to create an array of combinations. My problem is that the “2-out-of-52” is still fast on CPU but the “5-out-of-52” and the “7-out-of-52” has too many combinations to solve it fast on CPU. I want to work only with numbers from 0 to 51 and combine them.
Does anyone have an idea how could I solve it? My two main problem is that I don’t know how should I use the threadIdx and my GPU’s compute capability is only 1.1.

Thanks in advance!
Zoli :)

First off, what algorithm are you using? What’s it look like?

Second, threadIdx.x is the thread ID of the current block the kernel is executing. In CUDA, you execute in grids which are made of blocks which are made of threads.

For a single block, threadIdx.x gives you the thread ID of that thread in the block.

For multiple blocks, you index the thread like this :

const int tid = threadIdx.x + blockIdx.x * blockDim.x

You do this in the same way you would on a CPU;

There are 52 choose 7 possible hands= 133,784,560 combinations

Generate that many integers

for each unique number of the combination, derive the actual 7 card combination by decomposition of that number

This is the same general idea I used in my permutation code, and GPUs are particularly good at these type of problems

compute 1.1 is not going to cut it, spend the $150 and get a GTX 750ti.

My algorithm for the CPU used the next_permutation() function, but that’s what I cannot use in a kernel function… :(
It sounds interesting…I’m gonna try it out. Thanks!

Not the money is my problem, but these cards are very rare in Hungary…
It’s not so easy to get it because I live in a small town and I don’t have much time to solve this problem… :)

Thanks again!

Keep in mind that there is a difference between a permutation and a combination. With a permutation order matters, while with a combination order does not matter.

In your case order does not matter in a given hand, so you would not need to use next_permutation,as it would be overkill.

I was saying that the implementation of algorithm would use the decomposition process to get the distinct members of the combination.

You also use a 64-bit unsigned long long type, go through all numbers from 0 to 2^52-1, count the bits and see if there is 7 bits set, and if there is then determine which hand that represents. That approach would be excessive though and take longer than the approach I first mentioned.