For example, I have a test program for 5 kernels:
int main()
{
for (int i = 0; i < 10; i++){
kernel_1<<<...>>>(...); // warm up
}
for (int i = 0; i < 10; i++){
kernel_1<<<...>>>(...); // to be measured
}
...
for (int i = 0; i < 10; i++){
kernel_5<<<...>>>(...); // warm up
}
for (int i = 0; i < 10; i++){
kernel_5<<<...>>>(...); // to be measured
}
return 0;
}
Each kernel will run 20 times, but only the last 10 times need to be measured. And I need the average time/usage/statistics for the 10 times.
How to do it gracefully using ncu command line? Should I use cudaProfilerStart() / End() to assist?
I want the result to be written into an Excel file. I am a beginner, thank you for help.