I have an optimization code (written in FORTRAN) that implements a genetic algorithm. Given input data, the code defines a “population” of models, each with a vector of fitting parameters. The “fitness” of each model is computed, and models are ranked according to fitness. Based on this ranking, new models are “bred”. The fitness of the new population is computed, and the process repeats.
The loop that computes the fitness of each model can be run in parallel, and I have done so using openMP.
I have discovered that I don’t get identical results between the code compiled in the normal way and run on a single CPU and the code compiled using the -mp flag and run on multiple cores.
After each generation, the various arrays are written to a file, using this format:
The last digit is probably flopping in the breeze a bit given the limits of double precision. After the first generation, the output files are nearly the same. The ndiff utility shows
210c210 < 5.369572146596790E+06 1.862345774856187E-07 --- field 1 relative error 2.23e-15 > 5.369572146596778E+06 1.862345774856192E-07 265c265 < 2.415612679943479E+07 4.139736507855217E-08 --- field 1 relative error 1.24e-15 > 2.415612679943482E+07 4.139736507855212E-08 299c299 < 4.228869572689122E+05 2.364698136963595E-06 --- field 1 relative error 1.18e-14 > 4.228869572689172E+05 2.364698136963567E-06 ### Maximum relative error in matching lines = 9.81e-16 at line 245 field 1
The first column is the chi^2, and the second column is the fitness, which is 1.0d0/chi^2. In this run I have 100 models, so three of them had different output between the openMP version and the serial version. In two cases, the relative differences are a few times 1E-15, which is about what one can expect from double precision. However, the third one differs by a little over 1E-14. So somehow a digit was lost?
A generation or two later the results really start to diverge:
202c202 < 4.228869572689122E+05 2.364698136963595E-06 --- field 1 relative error 1.18e-14 > 4.228869572689172E+05 2.364698136963567E-06 226c226 < 9.801858231124967E+06 1.020214714822731E-07 --- field 1 relative error 1.93e-15 > 9.801858231124986E+06 1.020214714822729E-07 230c230 < 9.391013732425959E+03 1.064847766697571E-04 --- field 1 relative error 1.17e-15 > 9.391013732425970E+03 1.064847766697569E-04 232c232 < 9.697123939406827E+06 1.031233596939228E-07 --- field 1 relative error 1.95e-15 > 9.697123939406846E+06 1.031233596939226E-07 236c236 < 1.610902609306161E+07 6.207699920671891E-08 --- field 1 relative error 1.24e-15 > 1.610902609306163E+07 6.207699920671884E-08 247c247 < 5.369578966088783E+06 1.862343409633107E-07 --- field 1 relative error 1.67e-15 > 5.369578966088774E+06 1.862343409633111E-07 257c257 < 4.118674389812605E+05 2.427965664082270E-06 --- field 1 relative error 1.89e-14 > 4.118674389812527E+05 2.427965664082316E-06 269c269 < 2.414748317281700E+07 4.141218332541204E-08 --- field 1 relative error 1.24e-15 > 2.414748317281697E+07 4.141218332541209E-08 285c285 < 1.863263591365901E+07 5.366927173556431E-08 --- field 1 relative error 1.07e-15 > 1.863263591365903E+07 5.366927173556424E-08 294c294 < 3.648716487121342E+07 2.740689783735296E-08 --- field 1 relative error 1.09e-15 > 3.648716487121346E+07 2.740689783735293E-08 307c307 < 33 31 --- field 2 relative error 3.33e-02 > 33 30 370c370 < 7 13 --- field 1 relative error 1.32e+01 > 100 13 371c371 < 100 56 --- field 1 relative error 1.32e+01 > 7 56 400c400 < 1 30 --- field 2 relative error 3.33e-02 > 1 31 ### Maximum relative error in matching lines = 9.16e-16 at line 225 field 1
There are more fitness values with issues in the last digit. In addition, we see differences in the rankings. The two integer columns are the model index, and its ranking. In the multicore run, the model at index 33 was ranked #31, but the same model in the serial run was ranked #30. It turns out that these models have very similar chi^2 values, but the loss of one or two digits can cause the rankings to flip. Once the rankings flip, the genetic algorithm proceeds in a different way, and the differences grow rapidly after that.
Note that the codes were compiled exactly the same way, except the -mp flag is included to use openMP. I have tried -O2, -O3, -fast optimizer flags, and I consistently see slight differences in the fitness values. The same files also contain the parameter arrays, and those are identical in earlier generation.
As I understand it, variables can either be put on the “heap” or in the “stack” depending on whether openMP is used. Why should this matter in terms of the precision of the results? I am using PGI version 16.5, and also version 18.4 on a different machine (both Intel Xeon of some sort). Is there a compiler flag or two that I can try to minimize or eliminate this behavior?