SIGFPEs and the Accelerators

This is probably an unanswerable question, but I thought I’d pose it just in case.

I’m currently experimenting with using Accelerators in a well-established piece of software, trying to speed it up. I’m finding, however, that sometimes adding an

!$acc region

leads to SIGFPEs when there are none without the regions:

p0_2643:  p4_error: interrupt SIGFPE: 8

Is this a usual happenstance, or are there certain functions to avoid in accelerator regions that can lead to this? Like exp(), log()?

(Note, that is a p4_error meaning MPI, but this is mpirun -np 1. I’m not near using >1 CPUs yet.)

ETA: I’m currently trying to track down which bit of the code is doing this. In my idiocy, I accelerated lots of innocent bits of code. One (or more) turned out not to be so innocent…

Hi Matt,

One thing that may help in narrowing down where the FPE is coming from is to use the “-Ktrap=fp” flag. The flag will cause your program to abort when a FPE is encountered and tell you line line number where it occurred. It won’t trap code run on the accelerator, but at least give you a better idea of the cause. In particular, I’d be looking for a divide by zero.

  • Mat

Mat,

The code is compiled with -Ktrap=fp:

...-fast -Kieee -ta=nvidia -Minfo=all,accel -r4     -Mextend -Ktrap=fp  irrad.f

The problem is, it must have something to do with the accelerator because no line number is reported. Plus, as I’ve said, if I compile this for the host and not for the GPU, no SIGFPE is reported. Is there a way to get an approximate line number (say calling procedure level) with more verbosity?

I’m thinking, though, it’s time to one-by-one deaccelerate until I find it…

Hi Matt,

Is it possible to obtain the source? Either the issue is a difference between the accuracy of the GPU versus the CPU or it’s compiler bug. Either way, I’d like to take a look.

If it’s not publicly available, I can contact you directly or please send a note to PGI Customer Service (trs@pgroup.com)

Thanks,
Mat

I’m currently trying to isolate the code responsible for this (so I don’t need to pass on thousands of lines of code…though I might have to) and have been observing some oddities.

In the beginning, the SIGFPE occurred if I accelerated a group of 4, 5 DO loops. When I think accelerated each loop, one-by-one, no FPE. If I then rejiggered the loop scheduling (as mkcolg showed me was important in another thread), added the useful copy/copyout/copyin, and then reaccelerated the entire group…no SIGFPE.

I think I might be passing along two sets of loops. One that causes the SIGFPE and one that doesn’t.

Okay, I think I found a good code fragment to pass on. This isn’t the same one I was looking at…it’s one that’s more confusing. Confusing in that I’m not sure how an FPE is happening, and the math is simple. Coming your way, Mat…