Monte Carlo simulations on GPU How to accept the moves?


I am working now on a Monte Carlo code. The algorithm is like this:

  1. get the new positions for the N particles
  2. calculate the new N*N particle-particle interactions
  3. calculate the total change of energy
  4. accept the move with random probability
  5. after many moves get the histograms

I want to do this with as little transfers as possible. I have an idea of how to make the first and second step. but I am not sure about the 3), 4)and 5).
The step 3 involves a sum. Can I do it without gpu --> cpu transfer.
What about step 4)? At this step I have to get a random number and then decide if the move is accepted. If it is accepted I should transfer all the new positions to the old positions, but I would like to avoid this and and just try to pass as argument the different pointer instead of doing a device --> device copy.
At the end I would like to do the instantaneous histogram and pass the result to the cpu. Is it possible to transfer the histogram and in the same time the gpu would continue to work?

Also is i possible to save the results using cpu and in the same time the gpu would continue with the MC jumps?