I have compiled some physics code using fftw with pg… v. 6.2-3 and with icc/ifort v. 9.1 to compare performance.
For intel, I used the -fast option, and found the fftw3.1.2 part runs in 1 minute, which is a little less than what I get using gcc.
Then, I compiled with the Portland Group compiler (both the fftw libraries and my program, of course), and the same fftw takes 10 minutes!
After trying a few different sets of options including just -fast -tp amd64, I found the best one is -O0 -tp amd64, for which the fftw part still takes 3.5 minutes!
I am quite sure there is something funny going on, but I have no idea what it is. Has anyone here seen something like this before?