I am programming a connect four AI for a small university project. It works fine on my core 2 duo, but I wondered if CUDA might speed it up.
I tried to find an implementation of minimax algorithm in CUDA, but I could not find any, and therefore I wonder if it is possible to do that in CUDA.
I know that there is no recursive function in CUDA, but that is not the real problem.
I read that the 32 threads of warp have to do the same instruction, is that true?
If I have a card with 200 stream processors, will each one of them be able to run a kernel searching a part of the tree, or will this be impossible because of SIMT?
You might be able to some kind of a breadth-first search by doing a “reverse reduction” (look at the reduction SDK sample, and then think about it in reverse). Then, when you hit whatever depth you want to search for, you can do a kind of forward reduction, saving the path taken to each end-node. Or, if your tree is symmetric (every node has the same number of children, at least in the sense that you allocate memory for the same number of children for each node), you will know the memory locations of each end-node, so you can copy the results of the reverse-reduction back to the host, loop through the check the values of all end-nodes, then work backwards to the current state from the optimal end-node in order to decide which move to take.
If you use what I said, don’t think of your kernel as searching part of the tree…instead think of your kernel (at least, the “reverse-reduction”) kernel as building your tree, which can be “searched” either by standard reduction on the GPU or by the “memory offset” method on the host.