Private Variables and Illegal address

eric_t · December 3, 2018, 6:12pm

I’m working on a complex fortran model where the main loop calls multiple nested subroutines. I have the loop working and outputting correct values when compiled with target=mp. However, when I compile and run on a Tesla device I get an illegal address exception that happens inside my first subroutine when initializing a private real scalar. I have a handful of scalers that are uninitialized at loop start but are passed between device routines. I’ve discovered through trial and error that they must be declared private or I get inconsistent results. However the illegal address error goes away on the Tesla device if I don’t declare the scalers private and the loop runs successfully but with incorrect output.

I’m in the process of putting together a sample code snippet to recreate the problem, but i’ve been cornered on this for a few days now. Does this sound familiar to anyone?

Thanks,

Eric

eric_t · December 4, 2018, 3:46am

As I expected, my compact test program works properly, which means not declaring the scalar as private simply masks some other problem. I suspect something in the kernel is missing/corrupt as even passing one private variable to the subroutine results in “call to cuStreamSynchronize returned error 700: Illegal address during kernel execution”

cuda-gdb output:Program received signal CUDA_EXCEPTION_6, Warp Misaligned Address.
[Switching focus to CUDA kernel 0, grid 9, block (64,0,0), thread (0,0,0), device 0, sm 0, warp 17, lane 0]
0x00000000011985c0 in sintgrl_t_ ()

where “sintgrl_t” is simply initializing one single private scalar. I’ve eliminated all other code.

I’ll have to go back through the other private arrays one by one and see if I can narrow down the error. I thought perhaps the implicit copy slices were incorrect, as some show a strange format like “Generating implicit copyin(rdy_a(:,:,z_b_208))”

cuda-memcheck does not like my program for some reason. I immediately get “Error: process didn’t terminate successfully
========= The application may have hit an error when dereferencing Unified Memory from the host. Please rerun the application under cuda-gdb or Nsight Eclipse Edition to catch host side errors.
========= Internal error (20)”

However, running in either Cuda-gdb or pgdbg gets past this point, no issue.

and -Mbounds is no help with GPU target.

MatColgrove · December 4, 2018, 7:33pm

Hi eric_t,

Scalars are private by default. However, when passing a scalar by reference to a subroutine (which is default in Fortran), the scalar “escapes” the compilers visibility. Since it’s possible that the scalar’s address is then taken by some other global variable, the compiler must assume that this happening, so has to make the scalar a shared variable. This is why you’re getting wrong answers.

The fixes for this are to either explicitly add the scalar to a private clause (which you have done) or if the scalar is read-only in the subroutine, add the “value” attribute to the declaration variable in the subroutine (including the interface if there’s an explicit interface). With pass by value, the scalar will again be implicitly private.

As for the misaligned address error, is the subroutine a vector or seq routine?

“Generating implicit copyin(rdy_a(:,:,z_b_208))”

What data type is “rdy_a” and how many dimension does it have? By default if an array is not explicitly added to a data clause, the compiler must implicitly copy it to the device. By default, it will attempt to copy the smallest amount of the data so will infer the bounds by looking at it’s use within the compute region. “z_b_208” looks to be a compiler generated temp variable Though, I don’t have enough info to say why it’s picking this.

s I expected, my compact test program works properly, which means not declaring the scalar as private simply masks some other problem

Feel free to send the full code that reproduces the error to PGI Customer Service (trs@pgroup.com) and ask them to forward it to me. I’ll take a look and see what I can determine.

-Mat