How to detect NaN at runtime by using compiler flags

Jerryleo · April 22, 2010, 6:40am

Hi,

My codes will cause NaN problems at runtime if compiled by PGI compiler. I got the same issue with both of PGI 8.0-4 and PGI 10.3. I tried debug options ‘-g -C -Mchkfpstk -Mchkptr -Mchkstk’, there was no any warning or complaint at runtime.

It works fine if compiled by Gfortran or Intel compiler. I have no idea where is the problem. I wonder if it is possible to detect NaN at runtime by using compiler flags.

Thanks

MatColgrove · April 22, 2010, 4:20pm

Hi Jerryleo,

Unfortunately, there isn’t a way to detect NaNs. Though, you can try adding “-Ktrap=fp” to detect floating-point exceptions which may be the cause of the NaNs. Also, try adding “-Mbounds” in case it’s an array out-of-bounds problem. Next, I’d run the program through Valgrind (www.valgrind.org) to see of you have any UMRs.

If these don’t work, then you’ll need to debug the code and isolate the cause.

Mat

TheMatt · April 22, 2010, 6:19pm

Mat,

Are there options to -Kieee? The man page for pgfortran doesn’t mention them, and I do have to use -Kieee often.

MatColgrove · April 22, 2010, 6:45pm

Ooops, I meant “-Ktrap=fp”, though adding “-Kieee” wouldn’t hurt. I’ll fix my post above.

Mat

Jerryleo · April 26, 2010, 12:02am

Thanks for all replys.

I tried following options

-O2 -fast -w -Mfree -r8 -i4 -fast -Kieee -Ktrap=fp
-O2 -w -Mfree -r8 -i4 -fast -Kieee
-w -Mfree -r8 -i4 -fast -Kieee
-w -Mfree -r8 -i4 -Kieee
-w -Mfree -r8 -i4 -Kieee -Ktrap=fp

Both of them caused “Floating exception” at the runtime.

If I compiled codes with following options, it got NaN instead of “Floating exception” at runtime.

-O2 -fast -w -Mfree -r8 -i4

I compared the floating output between PGI and Intel, it seemed that PGI used more bits than Intel for the floating.

Output of PGI 
 hessian_local=    11.11111111111111      tb_error(k,n)=
   0.3000000000000000

Output of Intel 
 hessian_local=   11.1111111111111      tb_error(k,n)=  0.300000000000000

I compiled the codes with Intel compiler with following options, it worked fine without any problem

-w -ftz -align all -fno-alias -fp-model precise -r8 -i4

I wonder whether if there an equivalent option for PGI?

Thanks

MatColgrove · April 26, 2010, 8:38pm

Hi Jerry,

I would try compiling with “-w -Mfree -r8 -i4 -Kieee -Ktrap=fp -g” and then running the program in the debugger to see why the floating point exception occurs.

I compared the floating output between PGI and Intel, it seemed that PGI used more bits than Intel for the floating.

No, PGI is just printing on one more place. This shouldn’t have anything to do with the NaNs.

I wonder whether if there an equivalent option for PGI?

-w == -w
-ftz == -Mflushz
-align all == No equivalent
-fno-alias == No equivalent
-fp-model precise == -Kieee
-r8 == -r8
-i4 == -i4

Mat

mmarc1 · May 7, 2010, 11:33am

Hi,

An interesting problem arised here, relevant to my case: the absence of -fno-alias -fno-fnalias in PGI compilers. With these options tuned off both pgf90 and ifort compilations of our program work same 44 seconds. But with -fno-alias -fno-fnalias turned on ifort shows 23 seconds. Are you sure this is no any equivalent for them? The almost 2 times performance gain makes sense.

Dima.

MatColgrove · May 7, 2010, 5:11pm

Hi Dima,

I’m assuming your code uses F90 pointers? If so, then what’s happening is the “-fno-alias” is an user assertion to the compiler that the pointers point to contiguous, non-overlapping memory and therefor safe to vectorize or parallelize. Without this assertion, the compiler must be conservative and not vectorize.

We are aware of this deficiency, however it’s been a fairly low priority item since we haven’t found many customers that use F90 pointers. Though, we are very customer driven so please send in a performance bug report to PGI Customer Service (trs@pgroup.com). The more user requests, the higher the priority a feature is given.

Note, you might try using IPA (-Mipa=fast). Using whole program analysis, IPA may be able to determine that your pointers are safe to vectorize. Also, if you are able, consider using allocatable arrays instead of pointers since the code will be more performance portable.

Thanks,
Mat