For a kernel, I see that the number of threads is 512 while the number of instructions (smsp*.sum) is too small. Is that normal? Does that mean some threads had no instructions to execute? Any idea about that?
Yes, that’s possible, not every launched thread necessarily executes the same number of instructions. It can be helpful to collect the SourceCounters section and inspect the Instructions Executed and Predicated-On Thread Instructions Executed on the Source page to see precisely which SASS instructions are executed how often.