In the case of box 1, does it mean the data that can be compressed out of 335.54MB (L1 → L2 write)?
In the case of box 2, why is it 0.00B? Because of this, the compression ratio is displayed as inf. How can I interpret it?
Is it correct that the L2 compression block on the memory chart operates at the time it is written to L2? In other words, even if it is not eviction to device memory, is it correct that it is compressed and stored in L2?
Does l2 of rtx 4090 work with write back + write allocation?
But when I tested, if write traffic smaller than 8MB occurs in the L2 cache, it appears that no input is entered into the L2 Compression block on the NCU report.
So it seems like there is a minimum size for compression to run on, am I correct?
Thanks for sharing the report. I will file an internal ticket and get back to you when I have more information. For the cache and compression questions, I’m not sure, but you have more luck asking on the CUDA Programming and Performance forum.