I’m still working on my selection algorithm. I’m trying to choose 16 random samples from an array and swap them with the last 16 elements in the array. There’re 2 ways I see how to do it. But each has its challenge.

- Choose 16 unique random #s from 0 to N - 17. Then thread k swaps a random element with element N - 1 - k. None of the swaps share a memory location, hence the swaps don’t need to be atomic. This might result in a biased sample since the last 16 elements have 0 probabilty of being exchanged with another of the last 16.

The main issue is how to generate M unique random ints from 0 to N - 1 without doing rejection sampling. I know 2 unique random ints can be generated efficienty by this:

e0 = rand() % N

e1 = (e0 + 1 + rand() % (N - 1)) % N

This method can be visualized easily, but I can’t think how to extend it to more than 2 random #s.

- Choose 16 random #s (not necessarily unique) from 0 to N - 1. Then thread k swaps its random element with element N - 1 - k. Due to overlaps, the swaps have to be atomic.

I don’t know how to do atomic swaps of memory locations. I was thinking of something like:

temp_a = atomicExc(a, garbage);

temp_b = atomicExch(b, temp_a);

garbage = atomicExch(a, temp_b);

Despite other threads reading garbage, it miraculously works in the end, at least for simple cases. Anyone know how to swap memory atomically without mutual exclusion?