Instruction classification by Nsight

Hi
Regarding instruction categorization, I have the following questions:

1- According to Nvbit output, there are some fused instructions, in my opinion, which consist of multiple categories. For example, IADD.MOV. So, is this instruction considered as integer or move instruction? I would like to analyze the Nvbit output myself.

2- Some instructions are logically similar but fall in different categories. For example, LEA from integer class looks similar to ULEA from uniform datapath class. Does profiler classifies instructions exactly based on the ISA table described here?

When you’re asking about instruction classification, where are you looking in Nsight Compute specifically? Are there some instruction bins that you are trying to understand? Perhaps I can provide some more information if you can share where you are seeing these different classes.

Assume, I have supplied smsp__sass_thread_inst_executed_op_integer_pred_on.sum metrics which is similar to inst_integer in nvprof. Now, the question is does an instruction like IADD.MOV considered as integer instruction? What about LEA vs. ULEA?

Thanks for the details. I checked with the engineering team, and you’re correct that the groupings are based on the categories here CUDA Binary Utilities :: CUDA Toolkit Documentation Although none of them were familiar with an IADD.MOV instruction. Do you have any details you can share on where you saw that instruction?

This is the output of Nvbit opcode tool for one of the kernels from Resnet50 inference from MLPerf 2.0.

kernel 1 - sm80_xmma_fprop_implicit_gemm_interleaved_i8i8_i8i32_f32_nchw_vect_c_32kcrs_vect_c_32_nchw_vect_c_32_tilesize64x192x64_stage3_warpsize2x2x1_g1_tensor16x8x32_simple_t1r1s1_execute_kernel_trt - #thread-blocks 50176,  kernel instructions 315506688, total instructions 840665088
  BAR.SYNC.DEFER_BLOCKING = 1404928
  BRA = 4214784
  BSSY = 401408
  BSYNC = 401408
  DEPBAR.LE = 1204224
  EXIT = 200704
  F2IP.S8.F32.NTZ = 9633792
  FFMA = 19267584
  FMNMX.NAN = 38535168
  FSETP.NEU.AND = 200704
  I2FP.F32.S32 = 19267584
  IADD3 = 24686592
  IADD3.X = 10035200
  IMAD = 8429568
  IMAD.HI.U32 = 1204224
  IMAD.IADD = 5218304
  IMAD.MOV = 602112
  IMAD.MOV.U32 = 12845056
  IMAD.SHL.U32 = 2007040
  IMAD.U32 = 802816
  IMAD.WIDE = 1003520
  IMAD.X = 2809856
  IMMA.16832.S8.S8 = 38535168
  ISETP.GE.AND = 3813376
  ISETP.GT.AND = 4014080
  ISETP.GT.OR = 802816
  ...

As you can see there any several fused instructions like IMAD.MOV. So is this instruction considered as a move instruction or integer instruction?

Thanks for the clarification. I checked with the engineering team and verified that the instructions are classified based on the Opcodes here CUDA Binary Utilities :: CUDA Toolkit Documentation so IMAD.MOV is classified as an integer operation and would be included in smsp__sass_thread_inst_executed_op_integer_pred_on.sum

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.