I got the following problem. I want to measure the gst_efficieny and the gld_efficiency for my cuda application. I tried using the cuda visual profiler but for gst_efficieny it tells me that data for this is insufficient (collecting some metrics failed). Now I try using nvprof to achieve my aim. Since I want to generate the results for different parameters. Therefore I automated it using a script and due to that I’m looking for a solution that utilizes nvprof. The documentation distributed with cuda 5.0 tells me to generate these using the following formulas for devices with compute capability 2.0-3.0:

gld_efficiency = 100 * gld_requested_throughput/ gld_throughput

gst_efficiency 100 * gst_requested_throughput / gst_throughput

For the required metrics the following formulas are given:

gld_throughput = ((128 * global_load_hit) + (l2_subp0_read_requests + l2_subp1_read_requests) * 32 - (l1_local_ld_miss * 128)) / gputime

gst_throughput = (l2_subp0_write_requests + l2_subp1_write_requests) * 32 - (l1_local_ld_miss * 128)) / gputime

gld_requested_throughput = (gld_inst_8bit + 2 * gld_inst_16bit + 4 * gld_inst_32bit + 8

- gld_inst_64bit + 16 * gld_inst_128bit) / gputime

gst_requested_throughput = (gst_inst_8bit + 2 * gst_inst_16bit + 4 * gst_inst_32bit + 8

- gst_inst_64bit + 16 * gst_inst_128bit) / gputime

Since for the metrics used no formula is given I assume that these are events which can be counted by nvprof. But some of the events seem not to be available on my gtx 460 (also tried gtx 560 Ti). I pasted the output of nvprof --query-events.

http://pastebin.com/n22TW2Y1

Any ideas what’s going wrong or what I’m misinterpreting?