instruction-to-byte ratio

voilouvoila · April 5, 2014, 9:54pm

In reporting the performance of a kernel I’ve written, I’d like to calculate the instruction to byte ratio.

This quantity is defined as “how many arithmetic operations your kernel performs per byte it reads” [1], and the formula is:

RATIO = (32 * instructions_issued) / (128 * global_store_transaction + 128 * L1_global_load_miss)

I understand that this quantity provides an indication of whether the kernel is compute or memory bound. That being said, I don’t understand why we use global store transactions and L1 global load miss.

global_store_transactions is the number of lines stored into global memory
L1_global_load_miss are the number of lines attempted to be loaded from L1 cache but were missed.

Could someone explain why we are using global_store_transactions and L1_global_load_miss ? Could we not just use the dram memory throughput (dram_read_throughput + dram_write_throughput) multiplied by the kernel runtime?

[1] http://babrodtk.at.ifi.uio.no/files/publications/brodtkorb_etal_meta10.pdf

Topic		Replies	Views
Instructions/byte profiler calculation CUDA Programming and Performance	4	1506	April 4, 2011
How to calculate "ideal" ratio of instructions to memory accesses? CUDA Programming and Performance	6	1550	August 23, 2010
Perplexed by Global Load Transactions Per Request in P100 CUDA Programming and Performance	1	655	January 9, 2020
Trying to understand Transactions per request for P100 CUDA Programming and Performance	2	1521	February 26, 2018
Arithmetic Intesity & Compute to Global Memory Access Ratio How to compute CGMA? CUDA Programming and Performance	7	6910	November 25, 2010
Load or L2 Bottleneck? CUDA Programming and Performance	3	1266	April 17, 2017
Memory throughput on GTX480 (cudaprof question) how to calculate memory throughput from GST/GLD CUDA Programming and Performance	4	7107	May 28, 2010
Instruction issued counter CUDA Programming and Performance	1	629	July 27, 2011
P100, relationship between global transactions and texture reads inside the Unified L1/Texture memor... CUDA Programming and Performance	3	498	July 12, 2018
Why does my actual measured count of shared memory load/store instructions differ from the theoretical count? How can I explain and verify this differ GPU-Accelerated Libraries	1	45	November 14, 2025

instruction-to-byte ratio

Related topics