Ncu profiling l2 cache compression rate

I am trying to get some sense of the new l2 cache compression feature on a100 chips. The issue is ncu doesn’t show any compression success on the cudaCompressibleMemory SDK sample. The saxpy kernel is run twice for compressible and non-compressible cases. I don’t observe l2 compression on either case. Am I missing anything? I attached the detail of my setup. Thanks in advance!

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|

I am using 11.0 both for toolkit and cuda-samples. The ncu commands I am using are:

ncu -f -o compress-saxpy --kernel-id ::saxpy: --set full ./cudaCompressibleMemory

GPU Device 0: "Ampere" with compute capability 8.0

Generic memory compression support is available
Running saxpy on 167772160 bytes of Compressible memory
==PROF== Profiling "saxpy": 0%....50%....100% - 42 passes
Running saxpy with 216 blocks x 1024 threads = 1669.388 ms 0.000 TB/s
Running saxpy on 167772160 bytes of Non-Compressible memory
==PROF== Profiling "saxpy": 0%....50%....100% - 41 passes
Running saxpy with 216 blocks x 1024 threads = 1042.250 ms 0.000 TB/s

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
==PROF== Disconnected from process 3929706

ncu --import compress-saxpy.ncu-rep | grep “L2 Compression”

L2 Compression Ratio                                                                                                0
L2 Compression Success Rate                                                          %                              0
L2 Compression Ratio                                                                                                0
L2 Compression Success Rate                                                          %                              0

Can someone comment on this? I am just wondering if l2 cache compression works under the current hardware/software combination. How can I definitively confirm compression actually takes place? Does nsight compute correctly reflect compression status? Thanks!

i use the nvidia cudasample to test this metrics ,i also get 0 result.


sudo ncu --metrics lts__gcomp_input_sectors,lts__gcomp_output_sectors,lts__average_gcomp_input_sector_compression_rate,lts__average_gcomp_input_sector_success_rate,lts__average_gcomp_output_sector_compression_achieved_rate ./cudaCompressibleMemory
[sudo] password for yongjianli: 
==PROF== Connected to process 25356 (/home/cuda-samples/Samples/3_CUDA_Features/cudaCompressibleMemory/cudaCompressibleMemory)
GPU Device 0: "Ampere" with compute capability 8.6

Generic memory compression support is available
allocating non-compressible Z buffer
Running saxpy on 167772160 bytes of Compressible memory
==PROF== Profiling "saxpy" - 1: 0%....50%....100% - 3 passes
Running saxpy with 92 blocks x 768 threads = 894.610 ms 0.001 TB/s
Running saxpy on 167772160 bytes of Non-Compressible memory
==PROF== Profiling "saxpy" - 2: 0%....50%....100% - 3 passes
Running saxpy with 92 blocks x 768 threads = 91.903 ms 0.005 TB/s

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
==PROF== Disconnected from process 25356
[25356] cudaCompressibleMemory@127.0.0.1
  saxpy(float, const float4 *, const float4 *, float4 *, unsigned long), 2022-Mar-04 01:16:54, Context 1, Stream 7
    Section: Command line profiler metrics
    ---------------------------------------------------------------------- --------------- ------------------------------
    lts__average_gcomp_input_sector_compression_rate.pct                                 %                              0
    lts__average_gcomp_input_sector_compression_rate.ratio                                                              0
    lts__average_gcomp_input_sector_success_rate.pct                                     %                              0
    lts__average_gcomp_input_sector_success_rate.ratio                                                                  0
    lts__average_gcomp_output_sector_compression_achieved_rate.pct                       %                              0
    lts__average_gcomp_output_sector_compression_achieved_rate.ratio                                                    0
    lts__gcomp_input_sectors.avg                                                    sector                              0
    lts__gcomp_input_sectors.max                                                    sector                              0
    lts__gcomp_input_sectors.min                                                    sector                              0
    lts__gcomp_input_sectors.sum                                                    sector                              0
    lts__gcomp_output_sectors.avg                                                   sector                              0
    lts__gcomp_output_sectors.max                                                   sector                              0
    lts__gcomp_output_sectors.min                                                   sector                              0
    lts__gcomp_output_sectors.sum                                                   sector                              0
    ---------------------------------------------------------------------- --------------- ------------------------------

  saxpy(float, const float4 *, const float4 *, float4 *, unsigned long), 2022-Mar-04 01:16:55, Context 1, Stream 7
    Section: Command line profiler metrics
    ---------------------------------------------------------------------- --------------- ------------------------------
    lts__average_gcomp_input_sector_compression_rate.pct                                 %                              0
    lts__average_gcomp_input_sector_compression_rate.ratio                                                              0
    lts__average_gcomp_input_sector_success_rate.pct                                     %                              0
    lts__average_gcomp_input_sector_success_rate.ratio                                                                  0
    lts__average_gcomp_output_sector_compression_achieved_rate.pct                       %                              0
    lts__average_gcomp_output_sector_compression_achieved_rate.ratio                                                    0
    lts__gcomp_input_sectors.avg                                                    sector                              0
    lts__gcomp_input_sectors.max                                                    sector                              0
    lts__gcomp_input_sectors.min                                                    sector                              0
    lts__gcomp_input_sectors.sum                                                    sector                              0
    lts__gcomp_output_sectors.avg                                                   sector                              0
    lts__gcomp_output_sectors.max                                                   sector                              0
    lts__gcomp_output_sectors.min                                                   sector                              0
    lts__gcomp_output_sectors.sum                                                   sector                              0
    ---------------------------------------------------------------------- --------------- ------------------------------

have you solve your problem?