Hello I am new to CUDA and as such I am going to the tutorials. My goal is to write a CUDA implementation of parallel tempering to minimize a cost function in my research. Is there already such a code in existence? If not, I am more than willing, maybe even excited, to write one. (I am a physical chemistry graduate and I see many uses of GPGPU).
In parallel tempering one runs N replicas of your system in parallel, each at a different temperature. After N steps you swap states between two temperature adjacent replicas, accept/reject swap and then you continue. As the degrees of freedom in your system increases the more replicas are needed such that the acceptance ratio of swaps is nonzero. It seems to me that this problem is well suited for GPGPU. One can map each replica to a thread and thus run 1000’s of replicas. Am I being naive?
My Pseudo-Code for the Parallel Tempering follows:
=> Read in Data on Host
=> Transfer data from Host to Device
=> One Device
=> Initialize State
=> Calculate Cost Function
=> Make MC moves
=> Recalculate Cost Function
=> Accept/Reject new state based common MC criterion
=> After N MC moves try to swap replica/threads (states)
=> Sync Threads
=> Place Cost and State descriptor in shared memory
=> On one of the thread say index 1 determine which replicas (states) to swap
=> Accept/Reject swap base on the MC criterion
=> Sync Threads
=> Propagate each thread for another N MC steps and continue as above
Any general and technical advice from the community on how to maximize efficiency of the code and such?
Thanks in Advance