CUDA Command Line Profier - Calculating Global Memory Throughput

Hello,

I am trying to use the CUDA command line profiler to calculate the global memory throughput for a bunch of CUDA applications on a GTX580 card. 

According to the CUDA profiler user guide, to do this I have to collect the following counters - fb_subp0_read_sectors, fb_subp1_read_sectors, fb_subp0_write_sectors, fb_subp1_write_sectors. However, I am unable to collect all of them in the same run (I get the messages - Counter 'fb_subp1_read_sectors' is not compatible with other selected counters and it cannot be profiled in this run and Counter 'fb_subp1_write_sectors' is not compatible with other selected counters and it cannot be profiled in this run. I get similar messages for the counters for L2 read and write misses). 

I thought I could collect them using multiple runs of the applications, but the values of a counter are not the same across different runs, which makes me think collecting different counter values by running an application multiple times is not going to be very accurate. The following counters - fb0_subp0_read_sectors, fb0_subp1_read_sectors, fb1_subp0_read_sectors, fb1_subp1_read_sectors - and the corresponding write counters are not supported by the card. So, how do I calculate the effective memory throughput using the CUDA command line profiler?

Thanks,
nagesh