simple question measure Flops, Bandwidth

First of all hello everyone,

after reading many docs about measuring flops and effective bandwidth, i have some questions before measuring the performance of my kernel.

for GFlops:

  • what if i have integer operations, i just ignore them? or count them in?
  • I guess if i have branching i will need each thread to count their operations…any other alternative?

for Memory bandwidth:

  • As far as i know, i need to count all reads, all writes, and divide by the execution time of the kernel. My question here is do i need to count all read and writes or only the ones from global memory and ignore shared, texture, or any faster memory, ??
  • what about fermi and the automatic caching?

i could just measure both and keep the results, but these are the fundamentals that are not making sense for at the moment.
thanks in advance, any help is appreciated