How to se SSE4.1 & AMD Core Math Library to optimize WRF

Hi,

I’m on a 4 Quad-Core AMD Opteron 8378 CPUs box. I compile WRF V3.1 with PGI 9.0.1 and MPICH bound with PGI using the default compile options.

I wonder if it is possible to use AMD SSE4.1 & AMD Core Math Library to optimize WRF V3.1 while compiling.

Does anyone have this sort of experience?

Thanks

Hi Jerryleo,

AMD doesn’t support SSE4.1 (just SSE4a) so is not available on your Opteron. For systems that do support SSE4.1, the compiler will auto-generate these instruction where appropriate. Though, ACML would need to be updated by AMD if/when support for SSE4.1 is added.

  • Mat

Thanks for reply

How do I know the compiler auto-generates thess instruction where appropriate?

I checked the output of make, it auto-generated the “-fastsse -Mvect=noaltcode” options, but didn’t generate the “-tp” options.

Which sort of -tp options I should use for my case, amd64, amd64e, k8-64e, shanghai-64 or x64?

And how about the ACML, how do I know it already used the ACML or not?

Thanks

Hi Jerryleo,

How do I know the compiler auto-generates thess instruction where appropriate?

You’ll need to keep and inspect the assembly files. Adding the flags “-Mkeepasm -Manno” will tell the compiler to keep your assembly file and annotate them with your source’s text.

I checked the output of make, it auto-generated the “-fastsse -Mvect=noaltcode” options, but didn’t generate the “-tp” options. Which sort of -tp options I should use for my case, amd64, amd64e, k8-64e, shanghai-64 or x64?

An Opteron 8378 has a “Shanghai” micro-architecture so by default the compiler uses “-tp shanghai-64”. You should only need to set the target processor (-tp) if you will be running your application on a different system having a different architecture.

And how about the ACML, how do I know it already used the ACML or not?

I would check with AMD. However, given that AMD doesn’t support SSE4.1 instructions, I highly doubt their math library would use these instructions. I could be wrong though!

  • Mat

Thanks for detailed reply