Specific behavior for -fast

afernandez · January 13, 2021, 6:41pm

Hello,
This might sound a bit too basic but I need some clarification about the specific behavior in my system for -fast. The command nvfortran -help -fast is returning:

Reading rcfile /opt/nvidia/hpc_sdk/Linux_x86_64/20.9/compilers/bin/.nvfortranrc
-fast               Common optimizations; includes -O2 -Munroll=c:1 -Mlre -Mautoinline
                    == -Mvect=simd -Mflushz -Mcache_align
-M[no]vect[=[no]altcode|[no]assoc|cachesize:<c>|[no]fuse|[no]gather|[no]idiom|levels:<n>|nocond|[no]partial|prefetch|[no]short|[no]simd[:{128|256}]|[no]simdresidual|[no]sizelimit[:n]|[no]sse|[no]tile]
                    Control automatic vector pipelining
    [no]altcode     Generate appropriate alternative code for vectorized loops
    [no]assoc       Allow [disallow] reassociation
    cachesize:<c>   Optimize for cache size c
    [no]fuse        Enable [disable] loop fusion
    [no]gather      Enable [disable] vectorization of indirect array references
    [no]idiom       Enable [disable] idiom recognition
    levels:<n>      Maximum nest level of loops to optimize
    nocond          Disable vectorization of loops with conditionals
    [no]partial     Enable [disable] partial loop vectorization via inner loop distribution
    prefetch        Generate prefetch instructions
    [no]short       Enable [disable] short vector operations
    [no]simd[:{128|256}]
                    Generate [don't generate] SIMD instructions
     128            Use 128-bit SIMD instructions
     256            Use 256-bit SIMD instructions
     512            Use 512-bit SIMD instructions
    [no]simdresidual
                    Enable [disable] vectorization of the residual loop of a vectorized loop
    [no]sizelimit[:n]
                    Limit size of vectorized loops
    [no]sse         The [no]sse option is deprecated, use [no]simd instead.
    [no]tile        Enable [disable] loop tiling
-M[no]flushz        Set SSE to flush-to-zero mode
-Mcache_align       Align large objects on cache-line boundaries

The use of the brackets is somewhat confusing and I need to know what is active and what is not active when -fast is invoked. Are level-2 optimization, unrolling loops, loop-carried redundancy elimination, inlining, cache aligning and prefetching enabled, whereas association, fusing, vectorization and others are disabled? Or is everything enabled and I’m just reading too much into the brackets?
Thanks.

MatColgrove · January 13, 2021, 7:26pm

This is the key line:

Where “-fast” is equivalent to “-O2 -Munroll=c:1 -Mlre -Mautoinline -Mvect=simd -Mflushz -Mcache_align”

The remainder of the messages are giving more details about a few of these options but is not specifying what is enabled with “-fast”… The line above shows this. For example, “-Mvect=simd” is enabled with “-fast” though you can then disable vectorization with “-Mnovect” (the brackets indicate that the text is optional), or also enable partial vectorization by adding “-Mvect=partial”.

afernandez · January 13, 2021, 7:40pm

Hi Mat.
From your message, the typical (using the user’s guide own word) options are always on (1st line) and the second line indicates that the additional options are also active in this system. I guess that that is what the double equal signifies.
Thanks.

MatColgrove · January 13, 2021, 7:48pm

Correct, I just didn’t want to confuse the issue. The exact options use by -fast may change depending upon the target CPU with the “==” indicating the extra options for your target.

afernandez · January 13, 2021, 10:04pm

Hi Mat.
Just a follow-up (let me know if you prefer me to open a new ticket). One of the suboptions for -Mvect is cachesize: “c” ) -here using quotes instead of <>-. However, I wasn’t able to find any description for this suboption neither in the UG nor in the reference manual. What does this flag do and how do you specify “c”? Could you just give an example that shows the required syntax?
Thanks.

MatColgrove · January 13, 2021, 11:36pm

-Mvect=cachesize:c instructs the vectorizer, when performing cache tiling optimizations, to assume a cache size of C (in bytes). Example “.-Mvect=cachesize:6291456”

Though, I believe the flag is vestigial and hasn’t been needed since the AMD Opteron systems 10-15 years ago. I don’t think we document it anymore. We do keep these older flags around so we don’t break makefiles (an why it still appears in the help options), but it shouldn’t have effect on performance.

afernandez · January 14, 2021, 12:08am

That’s all what I needed to know. Thanks.

Topic		Replies	Views
Optimization issue Legacy PGI Compilers	1	1657	June 14, 2010
SEGV and -fast optimization (f90) Legacy PGI Compilers	2	3637	December 8, 2009
CUDA Fortran : -fast changes result Legacy PGI Compilers	1	2153	January 29, 2010
Fortran "-fast" slower on newer version of the com Legacy PGI Compilers	1	2759	September 5, 2017
what options does -fastsse use Legacy PGI Compilers	1	4062	May 4, 2007
Differences between -fast and -Ox options Legacy PGI Compilers	1	3158	March 20, 2017
Align allocatable arrays with pgf90 Legacy PGI Compilers	4	3410	November 1, 2010
segmentation error with -fast option Legacy PGI Compilers	1	2174	January 20, 2010
Using command line for vectorization Legacy PGI Compilers	6	14040	February 14, 2010
Nvfortran etc. man page lists -Mvect=idiom but not supported nvc, nvc++ and nvfortran	3	666	February 1, 2022

Specific behavior for -fast

Related topics