Loss of CUPTI counter cookbook

Hey all

The following link


used to have the CUPTI counters you’re supposed to set to collect the given derived metric. Has this been moved somewhere obvious which I’m just missing, or was it unintentionally removed? It was a very useful resource, I’d appreciate it if the table of counters was either put back in or if somebody could post it.


Those tables were intentionally removed because of the number and complexity of the metric formulas, and because some of the formulas required counters that are not publicly available.

Starting in 5.0 nvprof is capable of directly collecting metrics, so you no longer need to collect individual counts and compute the metric yourself.

If you are using CUPTI directly, then you can use the appropriate CUPTI API to get the events required for a given metric.

Hey David,

I’m using CUPTI directly. Did these metrics start changing with compute capabilities? It seems like a really roundabout way to go about collecting static information.

Thanks for the advice, but please add me to the tally (currently 1, at my best guess) of people who would like those tables, it was useful and I’ve always pointed people with basic performance questions to it.


Another David

The metrics available don’t change significantly (we usually add some new ones every release) but the way a given metric is calculated can definately be different for different devices. And over time we may discover a better way to calculate a metric and so that metric’s formula may change from one release to the next. The whole point of metrics is that they hide all that complexity and instead allow you to get a set of “commonly understood” values from any GPU.

The metrics themselves should provide data to answer basic performance questions so it is good to point people to them. But why does knowing the exact formula for the metric help? Perhaps the metric descriptions are not sufficiently detailed?


(sorry for spaced out replies, I’m not getting notifications of your replies)

I hadn’t checked out CUDA 5.5 yet, I didn’t know you guys were doing the “run this kernel enough times to get the value of this metric” at the API level, that’s pretty slick. I do tool development, and we historically used counters, I’ll try to push them to support the metric API. Beforehand it was nice to just have the list of counters to set to get metric I was interested in, with this it’s probably worth switching over.