questions on some metrics

  1. is there an equivalent of “gld_transactions_per_request” and “gst_transactions_per_request” in nsight compute UI?

  2. according to the documentation here: https://docs.nvidia.com/nsight-compute/NsightCompute/index.html#statistical-sampler, “Barrier - Warp was stalled waiting for sibling warps at a CTA barrier.”. Since blocks are abstractions over CTAs, is this only for any block-wise synchronization stalls? how about stalls caused by PTX assembly barrier calls:

barrier.sync
barrier.arrive

Are the above barrier stalls also included in the “Barrier” metric? Lastly, are warp-wise and grid-wise barrier stalls also included?

Thanks as always.

Sincerely,
Isaac Lee

Hi Isaac,

barrier.arrive should not cause this stall reason, as the warp is not stalled until the barrier condition is met. barrier.red and barrier.sync should cause this stall reason on the next PC offset following the SASS barrier (BAR) instruction.

Warp-wide barriers (syncwarp) are not included. Grid-wide barriers can contribute to this metric, but can also contribute to other stall reasons.

Hi Felix,

I see that from the table comparing nvprof metrics to nsight compute metrics: https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#nvprof-metric-comparison, there is no equivalent of “gld_transactions_per_request” and “gst_transactions_per_request”. Is there a reason why this metric was removed? I found it to be helpful to check if all my global reads and writes were coalesced.

Thanks a lot!

Is there an equivalent of "gld_transactions_per_request" and "gst_transactions_per_request" in nsight compute UI?

There is no equivalent to those metrics in Nsight Compute due to problems with calculating those metrics consistently. The closes you can compute in Nsight Compute is the ratio of

l1tex__t_sectors_{qualifier} / l1tex__t_requests_{qualifier}

I want to look into what these mean

l1tex__t_sectors_{qualifier}
l1tex__t_requests_{qualifier}

but there doesn’t seem to be any info on Nsight Compute and Nsight Compute CLI doc. I would really appreciate if you could you direct me to some source.

Thanks again.

Metrics are in general not explained in the documentation, since there are simply too many.

You can query all metrics from the command line, which will give you all qualifiers, descriptions (as available), etc. We are working to make this available from the UI in teh future, too. See https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#command-line-options-profile for the --query-metrics option, or use --help on the CLI. You will likely want to use (replace chip name as necessary):

nv-nsight-cu-cli --query-metrics --chip tu104

You can omit --chip to see the metrics for all local devices. You can then filter this list for those metrics. Note that those metrics are available from Volta onward since Nsight Compute 2019.1.

Thank you so much for your persistent help!