I have Fortran90 OpenMP code which I run on gentoo linux quad Opteron machine. The code has parallel and sequential parts which are executed in a loop sequence.
I experience the following - in parallel part all 4 CPUs show 100% user load as expected. When code enters sequential stage, however, one CPU shows 100% user load, while the rest three - 20% user + 80 % system = 100% load for each one as well.
It was the same for all kernels I tried from 2.6.3 to 2.6.8.
Same code run on another, dual, opteron machine (different motherboard) shows the same behaviour.
However, if I compile exactly the same code with Intel ifort, the resulting 32-bit code performs as expected - 100% user load on all 4 CPU’s at the parallel stage, and 100% user load on one CPU and 0% (both user and system) on the each of the rest three CPU’s during the sequential part.
Any ideas ?
BTW my quad-opteron hard crashes after an hour of such work, while dual is stable. Ifort produced code does not crash quad opteron - so the extra load reported with pgf90, I believe, is not imaginary.