Hi
I see that kernel IDs changes when I use different metrics. The version I am using is 2022.2. The program is based on Torch and I can attach the code, but let’s first see if this kind of behavior is normal.
I used two commands as below:
~/NVIDIA-Nsight-Compute-2022.2/nv-nsight-cu-cli --kill on -c 300 python3 main.py
~/NVIDIA-Nsight-Compute-2022.2/nv-nsight-cu-cli --kill on -c 300 --metrics smsp__inst_executed.sum python3 main.py
I have attached the outputs and the weird thing is that in the first command, I see this order:
==PROF== Profiling "GRU_elementWise_fp" - 200 (201/300): 0%....50%....100% - 9 passes
==PROF== Profiling "Kernel" - 201 (202/300): 0%....50%....100% - 9 passes
==PROF== Profiling "Kernel" - 202 (203/300): 0%....50%....100% - 9 passes
==PROF== Profiling "Kernel" - 203 (204/300): 0%....50%....100% - 9 passes
==PROF== Profiling "GRU_elementWise_fp" - 204 (205/300): 0%....50%....100% - 9 passes
==PROF== Profiling "Kernel" - 205 (206/300): 0%....50%....100% - 9 passes
==PROF== Profiling "GRU_elementWise_fp" - 206 (207/300): 0%....50%....100% - 9 passes
==PROF== Profiling "Kernel" - 207 (208/300): 0%....50%....100% - 9 passes
==PROF== Profiling "GRU_elementWise_fp" - 208 (209/300): 0%....50%....100% - 9 passes
==PROF== Profiling "Kernel" - 209 (210/300): 0%....50%....100% - 9 passes
==PROF== Profiling "GRU_elementWise_fp" - 210 (211/300): 0%....50%....100% - 9 passes
==PROF== Profiling "Kernel" - 211 (212/300): 0%....50%....100% - 9 passes
==PROF== Profiling "Kernel" - 212 (213/300): 0%....50%....100% - 9 passes
==PROF== Profiling "Kernel" - 213 (214/300): 0%....50%....100% - 9 passes
==PROF== Profiling "GRU_elementWise_fp" - 214 (215/300): 0%....50%....100% - 9 passes
==PROF== Profiling "Kernel" - 215 (216/300): 0%....50%....100% - 9 passes
==PROF== Profiling "GRU_elementWise_fp" - 216 (217/300): 0%....50%....100% - 9 passes
==PROF== Profiling "Kernel" - 217 (218/300): 0%....50%....100% - 9 passes
==PROF== Profiling "GRU_elementWise_fp" - 218 (219/300): 0%....50%....100% - 9 passes
==PROF== Profiling "Kernel" - 219 (220/300): 0%....50%....100% - 9 passes
==PROF== Profiling "GRU_elementWise_fp" - 220 (221/300): 0%....50%....100% - 9 passes
==PROF== Profiling "CatArrayBatchedCopy" - 221 (222/300): 0%....50%....100% - 9 passes
==PROF== Profiling "unrolled_elementwise_kernel" - 222 (223/300): 0%....50%....100% - 9 passes
==PROF== Profiling "ampere_sgemm_32x32_sliced1x4_tn" - 223 (224/300): 0%....50%....100% - 9 passes
and the second run:
==PROF== Profiling "GRU_elementWise_fp" - 200 (201/300): 0%....50%....100% - 1 pass
==PROF== Profiling "Kernel" - 201 (202/300): 0%....50%....100% - 1 pass
==PROF== Profiling "Kernel" - 202 (203/300): 0%....50%....100% - 1 pass
==PROF== Profiling "Kernel" - 203 (204/300): 0%....50%....100% - 1 pass
==PROF== Profiling "GRU_elementWise_fp" - 204 (205/300): 0%....50%....100% - 1 pass
==PROF== Profiling "Kernel" - 205 (206/300): 0%....50%....100% - 1 pass
==PROF== Profiling "GRU_elementWise_fp" - 206 (207/300): 0%....50%....100% - 1 pass
==PROF== Profiling "Kernel" - 207 (208/300): 0%....50%....100% - 1 pass
==PROF== Profiling "GRU_elementWise_fp" - 208 (209/300): 0%....50%....100% - 1 pass
==PROF== Profiling "Kernel" - 209 (210/300): 0%....50%....100% - 1 pass
==PROF== Profiling "GRU_elementWise_fp" - 210 (211/300): 0%....50%....100% - 1 pass
==PROF== Profiling "Kernel" - 211 (212/300): 0%....50%....100% - 1 pass
==PROF== Profiling "Kernel" - 212 (213/300): 0%....50%....100% - 1 pass
==PROF== Profiling "Kernel" - 213 (214/300): 0%....50%....100% - 1 pass
==PROF== Profiling "GRU_elementWise_fp" - 214 (215/300): 0%....50%....100% - 1 pass
==PROF== Profiling "Kernel" - 215 (216/300): 0%....50%....100% - 1 pass
==PROF== Profiling "GRU_elementWise_fp" - 216 (217/300): 0%....50%....100% - 1 pass
==PROF== Profiling "Kernel" - 217 (218/300): 0%....50%....100% - 1 pass
==PROF== Profiling "GRU_elementWise_fp" - 218 (219/300): 0%....50%....100% - 1 pass
==PROF== Profiling "Kernel" - 219 (220/300): 0%....50%....100% - 1 pass
==PROF== Profiling "GRU_elementWise_fp" - 220 (221/300): 0%....50%....100% - 1 pass
==PROF== Profiling "Kernel" - 221 (222/300): 0%....50%....100% - 1 pass
==PROF== Profiling "Kernel" - 222 (223/300): 0%....50%....100% - 1 pass
==PROF== Profiling "Kernel" - 223 (224/300): 0%....50%....100% - 1 pass
==PROF== Profiling "GRU_elementWise_fp" - 224 (225/300): 0%....50%....100% - 1 pass
==PROF== Profiling "Kernel" - 225 (226/300): 0%....50%....100% - 1 pass
==PROF== Profiling "GRU_elementWise_fp" - 226 (227/300): 0%....50%....100% - 1 pass
==PROF== Profiling "Kernel" - 227 (228/300): 0%....50%....100% - 1 pass
==PROF== Profiling "GRU_elementWise_fp" - 228 (229/300): 0%....50%....100% - 1 pass
==PROF== Profiling "Kernel" - 229 (230/300): 0%....50%....100% - 1 pass
==PROF== Profiling "GRU_elementWise_fp" - 230 (231/300): 0%....50%....100% - 1 pass
==PROF== Profiling "CatArrayBatchedCopy" - 231 (232/300): 0%....50%....100% - 1 pass
==PROF== Profiling "unrolled_elementwise_kernel" - 232 (233/300): 0%....50%....100% - 1 pass
==PROF== Profiling "ampere_sgemm_32x32_sliced1x4_tn" - 233 (234/300): 0%....50%....100% - 1 pass
As you can see kernel 220 is GRU_elementWise_fp
is both outputs, but after that the orders start to change where CatArrayBatchedCopy
is 221th when no metric is specified (though is has 9 passes per kernel) while the same kernel is 231th in the second command.
What do you think about that?
nsight_22.2.txt (1.8 MB)
nsight_22.2_inst.txt (245.2 KB)