Precision Error setting array to zero

khea_actua1 · March 1, 2012, 8:36pm

Something strange is happening, in my code, I have

DOUBLE PRECISION, DIMENSION(ilev, isize) :: aerop1_d
double precision, parameter :: zerod=0.0d0
...
    aerop1_d = zerodp

So after that line, if I print the values in the debugger (wait until that above line is executed), I get stuff like:

pgdbg> print aerop1_d
(21:22,11):  0                         0                        
(23:24,11):  0                         2.8275281606231457e-315
(25:26,11):  0                         0                        
(27:28,11):  0                         0

Why aren’t they all zero? And if that number is simply the machine representation of zero (closest approximation), why aren’t they all like that, rather than most being zero?

The code is built with pgfortran 12.1 on CentOS 5.7:

pgfortran -tp=istanbul  -mp -g -O0 -gopt -traceback -Mpreprocess -Minform=inform

I just noticed that I was using the 11.10 debugger, but that shouldn’t make a difference.

MatColgrove · March 2, 2012, 12:22am

Hi khea_actua1,

You are correct in that the value should be zero. The questions is it just how the debugger is presenting the data (something in the print) or is the memory really off just a bit.

As an experiment, can you run the debugger with the same program, but before using the “print” command, use the “hex” command to get the hex values of elements of the array? Is element (24,11) the same as the rest of the array?

After the “hex” command use the “print” command again to double check that it is the same element with the bogus value

Mat

khea_actua1 · March 2, 2012, 5:11pm

Hi,

This time it was element (25,11).

The list shows that I’m doing the prints right after the assignment to zero. aerop1 is a real, aerop1_d is a double.

pgdbg> list
 #381:         aerop1 = 0.0
 #382:         aerop1_d = zerodp
 #383:==>>     DO k=1,icob
...
pgdbg> print zerodp
0
pgdbg> hex aerop1_d(24:25,11)

aerop1_d(24:25,11):  0                         1.5481471191334162e-103  

pgdbg> hex aerop1_d(25,11)
0x2A963113C6025A0D 

pgdbg> print aerop1_d(24:25,11)

aerop1_d(24:25,11):  0                         1.5481471191334162e-103

So even with hex, it’s not actually zero.

Oh, and this time, I was using the same version of the debugger as the code was compiled with (12.2) Though I get the same when I use the 11.10 and 12.1 compilers and debuggers.

One other thing, I notice that when I look at the stack, the values make no sense. I have the code outputting values, and those values (say, the input integer variable intent(IN) icob=12) is right, but in the debugger when I look at it with the debugger, it’s a huge random number.

=> #0  coagd_d line: "coagd_d.f90"@171 address: 0x6019A0 
     ilg = 150, il1 = 1, il2 = 150, ilev = 28, throw = 0x10C836D0, isize = 12, roarow = 0x7FFF5E464FA0, rtcoa = 0x7FFF5E464FA0, .... ntp = 1581668736 ...  jlat = 1581668736, icob = 1581668736, ....

Meanwhile the print statement (in the code it self, printing to STDOUT) at the top of this file outputs:

 Just to double check against the compiler..
 icob=           12
 jlat=           19
 ntp=            8

MatColgrove · March 5, 2012, 6:50pm

Hi khea_actua1,

Unfortunately, we don’t know what’s wrong. It could be the compiler, debugger, a system issue, or a problem with your program. Can you send us a reproducible example (trs@pgroup.com)? If we can recreate it here, then we should be able determine the problem.

Thanks,
Mat

khea_actua1 · March 5, 2012, 11:03pm

Hi,

I could send an executable, but it won’t run without ~10 gigs of associated data that I’m not allowed to share (it’s a meteorological model written by the government.)

Would it be possible to try to work this out over the phone? Or maybe even with a multi-user screen session via ssh?

MatColgrove · March 7, 2012, 12:18am

Hi khea_actua1,

I’ve been talking with our lead Tools Engineer (Don) and Compiler Architect (Steven) about this. We’re not sure a multi-users screen would help. Possibly a phone call, but let’s first try a few things.

Don wants you to try the following:

(1) modify the source code to print the contents of the array at the source line immediately after the source line where the array is initialized, rebuild and run. See if the array is all zeroes.

(2) run the same executable under PGDBG. set a breakpoint on the source line immediately following the source line where the array is initialized. Print the array using the pgdbg ‘print’ command. See if the array is all zeros when viewed this way, and/or if it is the same as in (1)

** If the array is not all zeroes in #1, it may be a compiler bug, and I’d pass it off to Steven.
** If the array is all zeroes in #1, but not in #2, it is likely a debugger bug, and I’d provide a new set of instructions.
** If the arrays is all zeroes in both #1 and #2, continue to #3

(3) re-run the original example under the debugger that shows the non-zero element.

(4) re-run the original example that shows the non-zero element again, but stop at the initialization step and set a hardware watchpoint on the element

pgdbg> hwatch aerop1_d(25,11)
pgdbg> cont

If the array element is being clobbered, the hardware watchpoint should catch it. If it does, then capture a stack traceback (‘stack’ command) to find the offending code.

Steven also wanted to know the dimensions of aerop1_d.

Thanks,
Mat

khea_actua1 · March 7, 2012, 3:46pm

Sounds good. I’ll do that now (and update this posting)

Results:
1-2: Done by outputting these variables to a file, the executable was run from the shell, and I attached to it in order to be able to read these values.

Code: aerop1_d( 24,  11)        = 0.000000000000000
Debugger: aerop1_d( 24,  11)    = 2.827528121097894e-315

Same goes for other values, e.g., my input variable icob has a messed up value in the stack, but still prints out the proper value when printed from the code.

The hwatch never causes a break. (even when looping over the aerop1_d = zerodp)

Though, sometimes I get this:

pgdbg> hwatch aerop1_d(24,11)
Unable to set dr0 to addr b0f8cb992e17cd4c.
ERROR: Unable to set hardware watchpoint.

I notice though, if I set element (24,11) to another element that does display 0, it then stays as zero.

The dimensions of this array are (28,12)

About the stack, that’s also an issue I brought up in these messages: the values in this stack seem way off. But about your comment, won’t the stack just show me all the interface variables up to this routine? I find it doesn’t show me any code inside this routine…

khea_actua1 · March 9, 2012, 3:32pm

Hi, any news on this?

khea_actua1 · March 12, 2012, 10:00pm

Any news yet? Sort of paid The Portland Group 1000s$ of dollars for this debugger…

MatColgrove · March 12, 2012, 10:29pm

Without a reproducing example, there’s not much more we can do via the UF. I gave Don your email address and he’ll contact you as soon as he can.

Mat

donb · March 12, 2012, 10:52pm

This appears to be a debugger issue. We have filed a problem report for this and are formulating a plan to work with the poster to resolve the issue.

Once a resolution is available, we will post that information here.

–Don

khea_actua1 · March 14, 2012, 4:21pm

Without the -gopt flag, everything seems to work. (Advice from PGI)

tull · January 25, 2014, 12:41am

Hello,

A problem you reported about DWARF information in 2012 has been corrected in the current 14.1 release.

thanks,
dave