I have a “host” machine where I compile my code before submiting it to a cluster to be run. My concern is that the host machine has a 32-bit AMD Athlon MP processor and the cluster machines have Intel Xeon-32 bit processors. I’m not sure which processor to select as an input to the -tp option when compiling the code, so that I have the best efficiency when running on the Xeon processors. Can I compile with the -tp p7 option for P4 processors even though I’m on an Athlon, or do I need to ssh onto a compatable P4 machine and compile there before running a job.
Thanks for any help
Compiling with “-tp p7” will generate an executable targeting a P4 processor regardless of the build system.
Hope this helps,
That definately answered my question, but here’s another compiler/processor question. When I compile & run the exact same piece of code on two different machines with different processors, my output is different, not by a huge amount, but enough, that it raises questions. One machine has a 32 bit- AMD AthlonMP and is compiled with -tp athlonxp. The other machine is a 64-bit AMD Opteron compiled with -tp k8-64. Is this normal to see a differecne in my numerical output?
Yes, it is normal to see small differences in the numerical output. See my posting here
https://forums.developer.nvidia.com/t/precision-in-pgf90/130520/1 for more details. Also, AMD and Intel CPUs use different internal floating point representations (64 and 80 bits, respectively). Optimization can exaggerate this difference by removing intermediate loads and stores from the FPU registers. The example code in my referenced posting shows how skipping an intermediate store and reload due to optimization can affect results.