Hi all,

Quite new in cuda, I need you advices.
My personal project is to compute all solutions for a knight on a chessboard without going twice on the same place. A solution is when the knight went in all chessboard places.
I have a C program working on CPU.
My approche is based on paths described by sucessives coordonates from the knight starting (1,1), then (3.2) etc…
So a path look like 1132…
I’ve stored in db all path for 6 moves. Then I took the 1rst path of 6 moves and compute all paths for 12 moves, etc up to 24 moves.
Then computation start from a chessboard filled with first 24 moves and try all path (like a tree parse) to find solutions.

I start using cuda to accomplish this computation with more perfomances than cpu.

The global algorythm is quite simple:
label go_deeper
while next place is free, make the move forward
go to next move sideward
check if this is a solution
if all possible moves are done then go backward, then sideward
go to go_deeper

My cuda approach is to make one chessboard per thread.
Have a lot of threads
Inside 1 kernel
loop on forward moves
check solution
make sideward move ans or backward move

My Questions:

  • Does this approach fit CUDA spirit ?
  • Is their a more accurate approach
  • a lot of threads means howmuch depending on GPU device ?
  • how to compte number of blocs and threads bases on GPU device caracteritics ?
  • is it a good idea to compute the same amount of chessboard than de nb of cudacores from the deviec ?

Thanks to have read until their and for your advices!