Runtime trouble moving legacy code from CUDA 6.5 to 8.0

malcolm_h · June 22, 2021, 6:04pm

Hi all,
For my sins, I am trying to get a massive chunk of old CUDA Fortran code originally compiled with CUDA 6.5 and PGI14 to work with CUDA 8.0 and PGI 16.7 so that it runs on more modern hardware (more recent versions of CUDA/PGI give quite some problems). It compiles fine and parts of it work perfectly, but there are some pieces that give very wrong output despite running without complaint. The code is basically just doing a lot of arithmetic crunching.
Compiling with cc 35 using MKL and llvm, and it’s part of a larger code package. Running on a K40m.

Are there any immediate issues that come to mind that could be caused by the CUDA 6.5->8.0 transition? I know without actual code it’s hard for anyone to suggest anything specific, but I’d appreciate any general hints or ideas of what to try or what to think about.

njuffa · June 22, 2021, 8:11pm

Since you are inviting speculation: Possibly a latent bug in your code that got exposed through the change in toolchain and hardware. Maybe a race condition, an uninitialized variable, or access out of bounds.

cbuchner1 · June 23, 2021, 3:31pm

Running the code through cuda-memcheck might give some clues, specifically the RaceCheck tool.

malcolm_h · June 24, 2021, 7:36am

@njuffa Thanks for the speculations. I fear that it is some latent bug, but still holding out for some kind of simple memory acccess that has changed behaviour somehow.

@cbuchner1 Yes, a good reminder to go back to that, thanks. I had trouble with it earlier which I blamed on dealing with input scripts, but now I realise that actually I get a surprising error with cuda-memcheck [which maybe ought to be a different post, but oh well]:

========= Error: process didn’t terminate successfully
========= The application may have hit an error when dereferencing Unified Memory from the host. Please rerun the application under cuda-gdb or Nsight Eclipse Edition to catch host side errors.
========= Internal error (20)

I have seen suggestions elsewhere that this kind of error corresponds to segfaults or memory problems on the host. However I find it strange to have no errors when running normally, but to have one crop up when I use cuda-memcheck. Does cuda-memcheck require greater rigour from host memory management, that might cause a crash in this way? It is probably worth checking the allocation and deallocation of memory on the host regardless.

Robert_Crovella · June 24, 2021, 2:33pm

One thing to check is the application return code. The first thing to check is what return code is your application providing?

For example, if you have a main routine, the return code is whatever is being returned from main:

int main(){

  blah blah;
  ...
  return 0;  // return code
}

You can also query the return code from bash (just google that).

If your app returns anything other than zero, you should make it return 0 to use these tools. And if your app is returning a specific non-zero value for a reason, you should probably investigate that reason.

malcolm_h · June 24, 2021, 3:20pm

Thanks for pointing out the querying return code thing, it’s handy and I hadn’t seen that before.

If I run the program without cuda-memcheck, we get a 0 return as expected (everything seems to run fine, the numbers are just wrong).

When I run the program with cuda-memcheck, it just returns 1, consistent with the non-committal error message. Although it seems to me that this might be the return code from cuda-memcheck itself?

What puzzles me further is that some debugging reveals that the code runs up until the first device memory allocation of a given subroutine (not the first device allocation overall), then freezes for a bit and cuda-memcheck throws its error. Without cuda-memcheck we go straight through this allocation with no problems. So it appears that somehow cuda-memcheck is affecting this allocation? Is that a possibly correct understanding?

Robert_Crovella · June 24, 2021, 3:31pm

are you doing rigorous, proper CUDA error checking?

Normally cuda-memcheck has its primary impact in terms of device code execution, but it evidently also hooks calls into the CUDA libraries as well. Beyond that I couldn’t tell you precisely what it is doing.

malcolm_h · July 5, 2021, 10:07am

Just to close this off, in the end I did identify a race condition that presumably never caused trouble on older hardware/software somehow. It did require compiling with CUDA 10 and running cuda-memcheck there (although the code causes more problems with CUDA 10 it runs far enough to get racecheck through the relevant errors). I do not know why cuda-memcheck broke with CUDA 8 but not CUDA 10, but there we go. Thanks for the help!

system · September 3, 2021, 10:07am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CUDA Fortran Error Legacy PGI Compilers cuda	2	740	July 31, 2020
Problem with arch=sm_20 CUDA Programming and Performance	16	4229	March 4, 2011
CUDA Programs Returning Zero after Update to v6.5 CUDA Programming and Performance	8	1276	November 20, 2014
cudaFuncSetCacheConfig and CUDA 5.0 Legacy PGI Compilers	5	10148	April 24, 2013
Cannot run example "even easier introduction to CUDA" CUDA Programming and Performance	7	644	January 14, 2019
Error running simple CUDA Fortran program Legacy PGI Compilers	9	21312	February 26, 2010
cuda-memcheck with CUDA Fortran? Legacy PGI Compilers	0	3620	February 18, 2011
CUDA 4.0rc2 toolchain oddities CUDA Programming and Performance	1	35966	May 5, 2011
New dynamic shared memory allocation in CUDA 5? CUDA Programming and Performance	5	4280	November 8, 2012
32 bit CUDA Fortran exe can not run on 64 bit Windows 7 Legacy PGI Compilers	6	6571	June 26, 2014

Runtime trouble moving legacy code from CUDA 6.5 to 8.0

Related topics