It seems that
smsp__inst_executed.sum is not the total executed instructions as I expect because as you can see, it is less that control and ldst instructions.
smsp__inst_executed.sum inst 766803.000000 788421.000000 777612.000000 smsp__sass_thread_inst_executed_op_control_pred_on.sum inst 5500945.000000 5504005.000000 5502475.000000 smsp__sass_thread_inst_executed_op_memory_pred_on.sum inst 3948027.000000 3953653.000000 3950840.000000
I only can think that smsp__inst_executed reflects warp instructions. So total of thread instructions is 766803*32.
Any thought on that?