suggestions for debugging unified binary problems?

I’m trying the example programs from the June 2009 PGInsider article but having problems with the third example program, C version. I use ‘-ta=nvidia,host’ when compiling but compiler info messages show only a host version of the ‘smooth’ function is generated. When run, the program has numeric problems (most values fail verification tests).

The Fortran unified version builds both host and GPU kernels and satisfies validation tests. When built with just ‘-ta=nvidia’ the C version builds a GPU kernel and satisfies validation tests.

This is 9.0-3 on Linux (Fedora 10), with a cc11 device.

Any suggestions to debug where this is going wrong, or a workaround for generating working unified binaries?

Hi rothpc,

Looks like a bug. The Fortran example works fine as does the C example if yo set “ACC_DEVICE=HOST”. Only when you compile using the unified binary and run on the GPU does it get wrong answers. I’ll need to sent this on to engineering for further investigation.

  • Mat

Hi rothpc,

FYI, this issue will be resolved in the upcoming 9.0-4 release. A recent NVIDIA driver update started using more XMM registers then we were accounting for.

  • Mat