Floating point exception in data transfer

Hi all,
I am annotating a C code using OpenACC pragmas.
The code performs a simulation and can be run on different problem sizes; for production use it requires a huge amount of data (several GB) to be copied in the device before the kernels are launched.

For large data sizes I got a “Floating point exception”, while for small ones it run correctly; to isolate the problem I commented out the kernels launch and the error seems to be associated to the data transfer.

I discovered that increasing the value of $PGI_ACC_BUFFERSIZE the problem disappear.

I think that the problem may be related to the fact that for some reasons the data transfer could not be split in chunks small enough to fit the pinned buffer size… maybe it is related to the fact that I am copying a “short” array of “big” data structures? Maybe it is not able to split in different chunks single elements of the array? Can somebody confirm this?

I am writing here to help other people who may encounter this error, but also to suggest to handle this error condition in order to report a different error message than a generic “Floating point exception” which is hardly associable to a problem in the data transfer.

p.s.
I am using PGI 14.4 on a machine hosting a NVIDIA K20m.

Thanks in advance,

Bye,

Enrico

Hi Enrico,

Could you send a reproducing example to PGI Customer Support (trs@pgroup.com)? I’ve transferred ~6GB arrays to the device before so am unclear why you would get this error.

Note, if any individual array is >2GB in memory size, you need to add the “-Mlarge_arrays” flag if dynamically allocated or “-mcmodel=medium” if static (Linux only)

Thanks,
Mat

Hi,
using the same code mentioned in the first post of this thread (which evolved in the last year), running on a different machine equipped with several NVIDIA K80 GPUs, we forgot to set the $PGI_ACC_BUFFERSIZE and this time (using pgi 15.9) no error showed up…

…unfortunately the code “silently” produced wrong results, which was quite hard to debug…

…running the same code with smaller data sizes or setting $PGI_ACC_BUFFERSIZE big enough solve the problem…

Is it our code that is probably doing something that it is not supposed to do, or is this the intended behavior (wrong results if you don’t pay attention to the $PGI_ACC_BUFFERSIZE value)?
An unset or too small $PGI_ACC_BUFFERSIZE value shouldn’t be reported with a clearer error message?

Hi EnricoC,

The buffer size shouldn’t matter. If the buffers are too small, it should just break the data up into multiple transfers.

Is it our code that is probably doing something that it is not supposed to do, or is this the intended behavior

Since this is not something I’ve seen before, I’m not sure what’s wrong. I’ve successfully transferred very large arrays using the default buffer size. Having a reproducing example that I could look at would help.

Thanks,
Mat