I am trying to profile vectorized_elementwise_kernel in GNMT application from the source code below using Nsight compute.
I got the following command after i profile the nsight sys.
ncu --kernel-name vectorized_elementwise_kernel --launch-skip 127032 --launch-count 1 python3 -u train.py --local_rank=0 --seed 2 --train-global-batch-size 1024
However, when i run ncu, it gives me this error, unrecognised option ‘–kernel-name’
Do you know to to successfully profile the kernel from pytorch code using nsight-compute?
--kernel-name option was added in Nsight Compute version 2021.1. Looks like you are using an older version of Nsight Compute. You can use the
Please refer the “Command Line Options” section of the Nsight Compute CLI document in your local Nsight Compute installation.
Thanks for your response! I am currently using Nsight Compute version of 2022.1.0. I switched to --kernel-regex option and it is working. :-) I will run the entire application through to the completion to check the output.
--kernel-name option is supported in Nsight Compute version of 2022.1.0.
I switched to --kernel-regex option and it is working.
Did you switch when using the older version of Nsight Compute?
Sorry for the confusion. Right, i am using the older version of Nsight Compute. Version 2020.3.0.0. Now i understand why --kernel-name is not supported. Thanks!