Ncu profiling l2 cache compression rate

I am trying to get some sense of the new l2 cache compression feature on a100 chips. The issue is ncu doesn’t show any compression success on the cudaCompressibleMemory SDK sample. The saxpy kernel is run twice for compressible and non-compressible cases. I don’t observe l2 compression on either case. Am I missing anything? I attached the detail of my setup. Thanks in advance!

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|

I am using 11.0 both for toolkit and cuda-samples. The ncu commands I am using are:

ncu -f -o compress-saxpy --kernel-id ::saxpy: --set full ./cudaCompressibleMemory

GPU Device 0: "Ampere" with compute capability 8.0

Generic memory compression support is available
Running saxpy on 167772160 bytes of Compressible memory
==PROF== Profiling "saxpy": 0%....50%....100% - 42 passes
Running saxpy with 216 blocks x 1024 threads = 1669.388 ms 0.000 TB/s
Running saxpy on 167772160 bytes of Non-Compressible memory
==PROF== Profiling "saxpy": 0%....50%....100% - 41 passes
Running saxpy with 216 blocks x 1024 threads = 1042.250 ms 0.000 TB/s

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
==PROF== Disconnected from process 3929706

ncu --import compress-saxpy.ncu-rep | grep “L2 Compression”

L2 Compression Ratio                                                                                                0
L2 Compression Success Rate                                                          %                              0
L2 Compression Ratio                                                                                                0
L2 Compression Success Rate                                                          %                              0

Can someone comment on this? I am just wondering if l2 cache compression works under the current hardware/software combination. How can I definitively confirm compression actually takes place? Does nsight compute correctly reflect compression status? Thanks!