I am currently using CUDA to generate long streams of uniform randoms. The generation is currently done on one card only but I think I have reached the limits of this scheme. I want to be able to generate the random set on multiple cards on multiple computers. the algorithm for doing that should somehow have as many seeds as cards and guarantee that the subset generated on each card is orthogonal with the other subsets on the other cards.

has anyone solved this problem before? any ideas, code, references ?

thank you in advance.