I remember, 10 years ago or so, PTX documentation contained table with instruction set, where every instruction has it’s own execution time in clocks for actual architecture of that time. So you could easily estimate execution time of some ptx code segment. I tried searching that in recent documentation but couldn’t find anything. I understand that lot of instructions are emulated to fit some specific standard like IEEE 754, and fetching instruction from slow memory requires good optimization with instructions which could hide memory latency, but simple register to register MOV, ADD or SHR, MUL etc probably have well defined execution time in clocks. Even conditional branch instruction have well defined execution time in case when condition is met and when it’s not. Of course, in the best scenario those execution clocks (I suppose) assume all threads in the warp are in sync and non divergent.
Is there some documents for Pascal, Maxwell or Volta instruction execution timings?
Thanks in advance