Documentation for ptx compilation with --oFast-compile?

PTX Compiler API v12.6 has added a new option --oFast-compile
https://docs.nvidia.com/cuda/archive/12.6.0/ptx-compiler-api/index.html#compilation-options

I dont see it mentioned in the CUDA Toolkit 12.6 release logs. Is there anywhere I can find commentary on the design intent and any known effects / usecases?

How many settings (e.g. 0 1 2 3) does it offer, and what features are turned off / on for each setting?

Could you provide any information for what percent speedup one might expect?

The allowable values seem to be “0” and “max”

When compiling my ptx however, it seems to make compilation run slower:

# without
time for i in {1..300}; do ptxas -arch=sm_60 my_code.ptx; done
real    0m6.026s
user    0m3.725s
sys     0m2.343s

# with -Ofast-compile
time for i in {1..300}; do ptxas -arch=sm_60 my_ptx -Ofc=max; done
real    0m6.450s
user    0m3.796s
sys     0m2.694s

That said, I’ve also found that -O0 compiles this same ptx slower than -O1 which makes little sense to me.

Any tips on making compilation run faster would be appreciated.

I perhaps improves linking speed:

(24x speedup)