Bluefield3 provides several performance counters via mlx_pmc driver under /sys/class/hwmon.
I reviewed the latest hardware monitoring (hwmon) documentation:https://docs.nvidia.com/networking/display/nvidia-bluefield-bsp-v4-12-0.0.pdf. No unique aspects of BF3 counters were mentioned; in fact, the list of events for BF3 is marked as TBD.
I conducted practical tests onBF3 to observe various events. On BF3, there were only 8 llt, 8 llt_miss, and 2 mss.
-
(1) llt/llt_miss: Last Level Tile, which includes two sets of counters (llt, llt_miss) for monitoring Tile and cache metrics. Their counters are named with prefixes like HNF_ or GDC_BANK_.
-
(2) mss: Memory Controller and L3 Cache.
I have three questions:
1.Due to the lack of relevant documentation, it remains challenging to fully understand the meaning and purpose of each counter in BlueField-3. Is more detailed reference material available?
2.Why are there no TRIO or PCIe counters exposed in BlueField-3, unlike in BlueField-2? Is this a architectural difference or a limitation in the current implementation?
3.What does GDC/SKYLIB refer to in the context of these counters? Could you provide more background on its role and what it measures?
4.Does llc_miss
specifically refer to L2 cache misses, or does it include other levels?