NVFORTRAN and CUDA - Inconsistent results from run to run

Hi,

I’m experiencing a strange problem. I have ported a legacy Fortran code to run on GPUs using CUDA Fortran, device global variables declared in a module, and kernel loop directives. The results are great in terms of speed compared to the CPU version, and are reproducible, i.e., the results do not change from one run to another when using the same set of inputs. Subsequently, to add more capabilities, I declared some additional device global variables. However, just adding new variables to the common module makes the results unreproducible - using the same set of inputs, results differ from the original code and differ from one run to another. Commenting out the new variable declarations or moving the new variable declarations to a different position in the module (following some other declarations) gives me the same results as the original code. While the issue seems to have the symptoms of an out-of-bounds memory access or a race condition, compute-sanitizer using memcheck and racecheck show zero errors. While in the short term I have a way forward (moving the variable declarations to a different position in the module), I am afraid this issue may pop up again in some other context. I would be grateful if anyone has any suggestions on debugging or ideas on what the issue might be. Thanks!

Hmm, it does sound like some type of memory issue but you’ve checked the obvious, so I’m not sure. I’d probably see in which placement the new variable is located where the errors start to occur. Then track the preceding variable’s values via print statements to determine where the divergent answers start occurring. Hopefully this will give move clues to the issue.

Thanks for the suggestion. To confirm, you are saying to nail down the exact placement of the new variables that causes divergent results, under the assumption that the divergent results originate from the preceding variable? I can give that a try.