Specific behavior for -fast

This might sound a bit too basic but I need some clarification about the specific behavior in my system for -fast. The command nvfortran -help -fast is returning:

Reading rcfile /opt/nvidia/hpc_sdk/Linux_x86_64/20.9/compilers/bin/.nvfortranrc
-fast               Common optimizations; includes -O2 -Munroll=c:1 -Mlre -Mautoinline
                    == -Mvect=simd -Mflushz -Mcache_align
                    Control automatic vector pipelining
    [no]altcode     Generate appropriate alternative code for vectorized loops
    [no]assoc       Allow [disallow] reassociation
    cachesize:<c>   Optimize for cache size c
    [no]fuse        Enable [disable] loop fusion
    [no]gather      Enable [disable] vectorization of indirect array references
    [no]idiom       Enable [disable] idiom recognition
    levels:<n>      Maximum nest level of loops to optimize
    nocond          Disable vectorization of loops with conditionals
    [no]partial     Enable [disable] partial loop vectorization via inner loop distribution
    prefetch        Generate prefetch instructions
    [no]short       Enable [disable] short vector operations
                    Generate [don't generate] SIMD instructions
     128            Use 128-bit SIMD instructions
     256            Use 256-bit SIMD instructions
     512            Use 512-bit SIMD instructions
                    Enable [disable] vectorization of the residual loop of a vectorized loop
                    Limit size of vectorized loops
    [no]sse         The [no]sse option is deprecated, use [no]simd instead.
    [no]tile        Enable [disable] loop tiling
-M[no]flushz        Set SSE to flush-to-zero mode
-Mcache_align       Align large objects on cache-line boundaries

The use of the brackets is somewhat confusing and I need to know what is active and what is not active when -fast is invoked. Are level-2 optimization, unrolling loops, loop-carried redundancy elimination, inlining, cache aligning and prefetching enabled, whereas association, fusing, vectorization and others are disabled? Or is everything enabled and I’m just reading too much into the brackets?

This is the key line:

Where “-fast” is equivalent to “-O2 -Munroll=c:1 -Mlre -Mautoinline -Mvect=simd -Mflushz -Mcache_align”

The remainder of the messages are giving more details about a few of these options but is not specifying what is enabled with “-fast”… The line above shows this. For example, “-Mvect=simd” is enabled with “-fast” though you can then disable vectorization with “-Mnovect” (the brackets indicate that the text is optional), or also enable partial vectorization by adding “-Mvect=partial”.

Hi Mat.
From your message, the typical (using the user’s guide own word) options are always on (1st line) and the second line indicates that the additional options are also active in this system. I guess that that is what the double equal signifies.

Correct, I just didn’t want to confuse the issue. The exact options use by -fast may change depending upon the target CPU with the “==” indicating the extra options for your target.

Hi Mat.
Just a follow-up (let me know if you prefer me to open a new ticket). One of the suboptions for -Mvect is cachesize: “c” ) -here using quotes instead of <>-. However, I wasn’t able to find any description for this suboption neither in the UG nor in the reference manual. What does this flag do and how do you specify “c”? Could you just give an example that shows the required syntax?

-Mvect=cachesize:c instructs the vectorizer, when performing cache tiling optimizations, to assume a cache size of C (in bytes). Example “.-Mvect=cachesize:6291456”

Though, I believe the flag is vestigial and hasn’t been needed since the AMD Opteron systems 10-15 years ago. I don’t think we document it anymore. We do keep these older flags around so we don’t break makefiles (an why it still appears in the help options), but it shouldn’t have effect on performance.

That’s all what I needed to know. Thanks.