Compute data compression in Ampere A100

I want to take advantage of compute data compression feature which is new on Ampere A100. But when I try the cuda sample code, I get same performance w/wo compression. Not sure what happens, is the compute data compression capability not exposed yet?

Run log:
GPU Device 0: “Ampere” with compute capability 8.0

Generic memory compression support is available
Running saxpy on 167772160 bytes of Compressible memory
Running saxpy with 216 blocks x 1024 threads = 0.383 ms 1.313 TB/s
Running saxpy on 167772160 bytes of Non-Compressible memory
Running saxpy with 216 blocks x 1024 threads = 0.387 ms 1.300 TB/s

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.