Nvfortran produces incorrect behaviour in kernel with -O but succeeds with -O1

I have written a CUDA Fortran kernel for my biogeochemistry model.
In the beginning, I load a lot of constants from a vector, doing a manual iteration over its elements:

  critical_stress = constants(m,n)
  cya0            = constants(m,n)
  din_min_lpp     = constants(m,n)
  din_min_spp     = constants(m,n)
  dip_min_cya     = constants(m,n)

When I compile the kernel with -O1, it gives the expected behaviour. When I use -O or higher, it gives wrong results, assigning wrong elements of my “constants” array to the scalar variables.
I cannot provide a short “minimal example” because if I shorten my kernel by deleting some lines in the end, it suddenly works fine. I attached the full kernel in a zipfile. My compilation command is

nvfortran -mcmodel=medium -O -i8 -r8 -gpu=cc80,keep -I/scratch/usr/mvkradtk/gpu/netcdf/include -Minfo=all -c …/src/bgc_kernel_WAT.CUF

Please let me know if I can give further assistance, e.g. by providing access to my full project.
bgc_kernel_WAT.zip (17.5 KB)

I’ll need access to the full project in order to investigate.

Given it works at -O1 vs -O, there’s possibly a compiler issue, but it could be a program error as well.

Dear Mat,

thank you for your willingness to look into this.

I have prepared a .tar.gz file which should contain everything you need to build and run the model, including a few large input files. You can download it from

ftp.io-warnemuende.de (login anonymous)
It’s 22 GB large so it may take a little while to download.
Please see the README.txt file inside on how to build and run. This should be easy.

If you need more background information on what the program is doing (or should be doing when it is finished), please see the documentation on

phy_drcs/roboslave: Rapid Ocean Biogeochemistry Offline Simulation by Lagrangian Advection of VEctors - roboslave - git.io-warnemuende.de

If you need anything else please let me know.

Thank you!

Thanks Hagen. It took a bit to download and get time to investigate, but I think I have an idea what’s going on.

It looks to me that all the “n=n+1” are getting optimized out so the “var = constant(m,n)” is returning the wrong value. The work around would be to replace “n” with the literal value.

.... 1-39
      r_ips_ero       = constants(m,40)
      r_ips_liber     = constants(m,41)
      r_lpp_assim     = constants(m,42)
      r_lpp_resp      = constants(m,43)
      r_nh4_nitrif    = constants(m,44)
... 45-117

I’m going to try an extract this portion out to see if I can create a smaller reproducer and then report it to engineering, but it might not be till later this week.


Dear Mat,
thank you for looking into this.
I can easily replace the "n"s as you showed, or I can also stick to compiling with -O1, that’s not a problem for me, the runtime of the code is small compared to that of other kernels. But if it’s a compiler issue, it may be harmful in other application contexts and might be important to fix, that’s why I posted it here.
I also thought a bit, my alternative guess is that another kernel of my project could be accidentally (e.g. by out-of-range errors) writing into the memory where this kernel resides, and accidentally modify some bytes of it. Would that also be possible or would compute-sanitizer detect this? (this runs with no errors)
Cheers, Hagen