I am writing as asking around different people locally was giving me back some very conflicting information. I don’t have any experience with GPU coding so fat and we are thinking to follow this path. I think this would be the best place to get some feedback from some experienced users. I currently have composed a fortran90 MDS code which I am using to simulate a thermodynamic system. The code needed years in the making and we realised that it has more potential to be explored for the next year or two. We were initially running on a large CPU cluster with more than 3k cores but this is no longer accessible. This code is not RAM heavy on a CPU (only a couple of MB) and the run time is relatively short (a couple of hours and the most demanding cases maybe at most a couple of days).
I need a high throughput platform (i.e. several tenths of thousands of different runs) in order to collect enough statistics and extrapolate a macroscopic behavior of my simulated system. We need the runs to finish in a relatively short interval so as we increment the study according to the results obtained (hence running for months at a time before being able to receive the results is not realistically possible). I need to run the same small MDS code several times without any real parallelism inside the code and hence we thought that running this on a GPU platform might be easy and cost effective to do the job given that we will not be getting into the specifics of real parallelism conundrums (timing the loops, distributing memory etc.) which we do not currently have the time to explore. Each version of the code needs to run, self contained and then return back a file of the order of 7-10MB where all of the results are recorded. There is a possibility to utilise the latest model 6 x T80 Tesla GPU cards with the fund agreement that we have in place. The equivalent funds will only get us something of the order of 100 CPU cores which would not do the job (we estimated that we need the equivalent of at least 3-4k CPU cores to get something useful out of the code but we do not have the resources for that).
If you have any experience in these types of simulations, can you please advise if this is something that could be done on a GPU platform using PGI to compile the fortran90 code? I understand that we still need to to modifications to the current code but these are going to be small for what we need it to do. Is it possible to run a self contained code several times on each of the GPU cores (or the equivalent of a core) of the tesla system and get these results? Memory wise I calculated that there will be enough memory to utilise at least half of the GPU cores available which is still orders of magnitude more simultaneous runs compared to 100 CPU cores that we could buy with the given funds.
I will be looking forward to receiving your responses.