instruction-to-byte ratio

In reporting the performance of a kernel I’ve written, I’d like to calculate the instruction to byte ratio.

This quantity is defined as “how many arithmetic operations your kernel performs per byte it reads” [1], and the formula is:

RATIO = (32 * instructions_issued) / (128 * global_store_transaction + 128 * L1_global_load_miss)

I understand that this quantity provides an indication of whether the kernel is compute or memory bound. That being said, I don’t understand why we use global store transactions and L1 global load miss.

global_store_transactions is the number of lines stored into global memory
L1_global_load_miss are the number of lines attempted to be loaded from L1 cache but were missed.

Could someone explain why we are using global_store_transactions and L1_global_load_miss ? Could we not just use the dram memory throughput (dram_read_throughput + dram_write_throughput) multiplied by the kernel runtime?

[1] http://babrodtk.at.ifi.uio.no/files/publications/brodtkorb_etal_meta10.pdf