First of all hello everyone,
after reading many docs about measuring flops and effective bandwidth, i have some questions before measuring the performance of my kernel.
for GFlops:
- what if i have integer operations, i just ignore them? or count them in?
- I guess if i have branching i will need each thread to count their operations…any other alternative?
for Memory bandwidth:
- As far as i know, i need to count all reads, all writes, and divide by the execution time of the kernel. My question here is do i need to count all read and writes or only the ones from global memory and ignore shared, texture, or any faster memory, ??
- what about fermi and the automatic caching?
i could just measure both and keep the results, but these are the fundamentals that are not making sense for at the moment.
thanks in advance, any help is appreciated
Cristobal