Cuda fortran and optimization flags

Hi,

I have some Cuda fortran code that makes heavy use of trigonometric functions for complex numbers.

I have noticed that the code produces correct output when I compile without optimization flags, i.e., -Mcuda=maxregcount:32,cc30,cuda5.5

However, as soon as I turn on optimization (e.g., -O2), results are corrupted. It’s difficult for me to find the problem but I am suspicious that optimization breaks some of the device intrinsics that I am using.

Any suggestions why this could happen? I thought that optimization only affect the CPU part of the code, not the device intrinsics.

Thanks, Jan

Hi Jan,

Optimization would be applied to your kernels so does have an effect. Though exactly what’s happening I’m not sure. Can you send us a reproducing example (or the whole code) so we can investigate?

Either send a note to PGI Customer Service (trs@pgroup.com) or I can send you a link to NVIDIA’s secure ftp where you can upload the package.

Thanks,
Mat