Performance counters of Bluefield3

Bluefield3 provides several performance counters via mlx_pmc driver under /sys/class/hwmon.
I reviewed the latest hardware monitoring (hwmon) documentation:https://docs.nvidia.com/networking/display/nvidia-bluefield-bsp-v4-12-0.0.pdf. No unique aspects of BF3 counters were mentioned; in fact, the list of events for BF3 is marked as TBD.

I conducted practical tests onBF3 to observe various events. On BF3, there were only 8 llt, 8 llt_miss, and 2 mss.

  • (1) llt/llt_miss: Last Level Tile, which includes two sets of counters (llt, llt_miss) for monitoring Tile and cache metrics. Their counters are named with prefixes like HNF_ or GDC_BANK_.

  • (2) mss: Memory Controller and L3 Cache.

I have three questions:
1.Due to the lack of relevant documentation, it remains challenging to fully understand the meaning and purpose of each counter in BlueField-3. Is more detailed reference material available?
2.Why are there no TRIO or PCIe counters exposed in BlueField-3, unlike in BlueField-2? Is this a architectural difference or a limitation in the current implementation?
3.What does GDC/SKYLIB refer to in the context of these counters? Could you provide more background on its role and what it measures?
4.Does llc_miss specifically refer to L2 cache misses, or does it include other levels?

1 Like

Hi @hua.zhang.2106108 ,

For Q1/Q2: You are right, some functions are still TBD for BF3. Meanwhile, BF3 has a different designation from BF2.

For Q3: Please refer to DOCA Telemetry Service Guide - NVIDIA Docs

For Q4: I think you mean llt_miss, right? It refer to L3 cache misses.

Best regards!