I see a chart in for the L1/tex cache in the nsight manual [1], however one thing is not clear. As you can see the total (load+store) hit rate is 90.62% and the total accesses are 100,663,296. There is also, a definition which says

Sector hit rate (percentage of requested sectors that do not miss) in the L1 cache. Sectors that miss need to be requested from L2, thereby contributing to Sector Misses to L2.

So, I expect that sector_misses_to_l2 * 32 be the the number of misses at L1/Tex. So, 1,310,720*32 = 41,943,040. So, the L1/Tex miss rate should be 41,943,040/100,663,296 = 0.416. As you can see, the miss rate is about 41.6% while the hit rate is 90.62%. What is missing here?

**UPDATE (3 hours later):**

The stats in the figure are weird. Based on the numbers, the calculated hit rates are not consistent.

Loads (sectors) = 2,097,152

Loads (hit rate) = 87.5%

Loads (sector misses to L2) = 262,144

==> Hit rate = (2,097,152 - 262,144) / 2,097,152 = 0.875

Stores (sectors) = 1,048,576

Stores (hit rate) = 96.88%

Stores (sector misses to L2) = 1,048,576

==> Hit rate = (1,048,576 - 1,048,576) / 1,048,576 = 0 (!)

Total (sectors) = 3,145,728

Total (hit rate) = 96.88%

Total (sector misses to L2) = 1,310,720

==> Hit rate = (3,145,728 - 1,310,720) / 3,145,728 = 0.583 (!)

Any note?

[1] https://docs.nvidia.com/nsight-compute/ProfilingGuide/index.html#memory-tables-l1