I don’t recommend the NVIDIA Visual profiler for use on RTX 6000. You should use one of the new profilers. For gathering these kinds of metrics, the one to use is nsight compute. This blog should help with learning to use nsight compute and gather metrics (although it doesn’t cover shared memory specifically).
One possible approach (more or less consistent with the approach laid out in the best practices guide you already linked) would be to gather the metrics that track shared memory activity (loads, stores) and then divide that by the timeframe of interest, such as the kernel duration, perhaps. For example you might use the metric for shared load transactions:
and there is a similar one for shared store transactions. The previously linked blog will show how to convert these to bytes. You could then divide by your measured kernel duration. However, looking at that metric table, there are already metrics for shared throughput, for example for loads:
So that is probably easier.
I suggest asking detailed profiler usage questions on the forum for whichever profiler you are using.