I am trying to get some sense of the new l2 cache compression feature on a100 chips. The issue is ncu doesn’t show any compression success on the cudaCompressibleMemory SDK sample. The saxpy kernel is run twice for compressible and non-compressible cases. I don’t observe l2 compression on either case. Am I missing anything? I attached the detail of my setup. Thanks in advance!
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
I am using 11.0 both for toolkit and cuda-samples. The ncu commands I am using are:
ncu -f -o compress-saxpy --kernel-id ::saxpy: --set full ./cudaCompressibleMemory
GPU Device 0: "Ampere" with compute capability 8.0
Generic memory compression support is available
Running saxpy on 167772160 bytes of Compressible memory
==PROF== Profiling "saxpy": 0%....50%....100% - 42 passes
Running saxpy with 216 blocks x 1024 threads = 1669.388 ms 0.000 TB/s
Running saxpy on 167772160 bytes of Non-Compressible memory
==PROF== Profiling "saxpy": 0%....50%....100% - 41 passes
Running saxpy with 216 blocks x 1024 threads = 1042.250 ms 0.000 TB/s
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
==PROF== Disconnected from process 3929706
ncu --import compress-saxpy.ncu-rep | grep “L2 Compression”
L2 Compression Ratio 0
L2 Compression Success Rate % 0
L2 Compression Ratio 0
L2 Compression Success Rate % 0
Can someone comment on this? I am just wondering if l2 cache compression works under the current hardware/software combination. How can I definitively confirm compression actually takes place? Does nsight compute correctly reflect compression status? Thanks!
i use the nvidia cudasample to test this metrics ,i also get 0 result.
sudo ncu --metrics lts__gcomp_input_sectors,lts__gcomp_output_sectors,lts__average_gcomp_input_sector_compression_rate,lts__average_gcomp_input_sector_success_rate,lts__average_gcomp_output_sector_compression_achieved_rate ./cudaCompressibleMemory
[sudo] password for yongjianli:
==PROF== Connected to process 25356 (/home/cuda-samples/Samples/3_CUDA_Features/cudaCompressibleMemory/cudaCompressibleMemory)
GPU Device 0: "Ampere" with compute capability 8.6
Generic memory compression support is available
allocating non-compressible Z buffer
Running saxpy on 167772160 bytes of Compressible memory
==PROF== Profiling "saxpy" - 1: 0%....50%....100% - 3 passes
Running saxpy with 92 blocks x 768 threads = 894.610 ms 0.001 TB/s
Running saxpy on 167772160 bytes of Non-Compressible memory
==PROF== Profiling "saxpy" - 2: 0%....50%....100% - 3 passes
Running saxpy with 92 blocks x 768 threads = 91.903 ms 0.005 TB/s
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
==PROF== Disconnected from process 25356
[25356] cudaCompressibleMemory@127.0.0.1
saxpy(float, const float4 *, const float4 *, float4 *, unsigned long), 2022-Mar-04 01:16:54, Context 1, Stream 7
Section: Command line profiler metrics
---------------------------------------------------------------------- --------------- ------------------------------
lts__average_gcomp_input_sector_compression_rate.pct % 0
lts__average_gcomp_input_sector_compression_rate.ratio 0
lts__average_gcomp_input_sector_success_rate.pct % 0
lts__average_gcomp_input_sector_success_rate.ratio 0
lts__average_gcomp_output_sector_compression_achieved_rate.pct % 0
lts__average_gcomp_output_sector_compression_achieved_rate.ratio 0
lts__gcomp_input_sectors.avg sector 0
lts__gcomp_input_sectors.max sector 0
lts__gcomp_input_sectors.min sector 0
lts__gcomp_input_sectors.sum sector 0
lts__gcomp_output_sectors.avg sector 0
lts__gcomp_output_sectors.max sector 0
lts__gcomp_output_sectors.min sector 0
lts__gcomp_output_sectors.sum sector 0
---------------------------------------------------------------------- --------------- ------------------------------
saxpy(float, const float4 *, const float4 *, float4 *, unsigned long), 2022-Mar-04 01:16:55, Context 1, Stream 7
Section: Command line profiler metrics
---------------------------------------------------------------------- --------------- ------------------------------
lts__average_gcomp_input_sector_compression_rate.pct % 0
lts__average_gcomp_input_sector_compression_rate.ratio 0
lts__average_gcomp_input_sector_success_rate.pct % 0
lts__average_gcomp_input_sector_success_rate.ratio 0
lts__average_gcomp_output_sector_compression_achieved_rate.pct % 0
lts__average_gcomp_output_sector_compression_achieved_rate.ratio 0
lts__gcomp_input_sectors.avg sector 0
lts__gcomp_input_sectors.max sector 0
lts__gcomp_input_sectors.min sector 0
lts__gcomp_input_sectors.sum sector 0
lts__gcomp_output_sectors.avg sector 0
lts__gcomp_output_sectors.max sector 0
lts__gcomp_output_sectors.min sector 0
lts__gcomp_output_sectors.sum sector 0
---------------------------------------------------------------------- --------------- ------------------------------
have you solve your problem?