How to use ncu command to profile average time/usage/etc for a kernel repeating 10 times?

For example, I have a test program for 5 kernels:

int main()
{
    for (int i = 0; i < 10; i++){
        kernel_1<<<...>>>(...); // warm up
    }
    for (int i = 0; i < 10; i++){
        kernel_1<<<...>>>(...); // to be measured
    }
    ...
    for (int i = 0; i < 10; i++){
        kernel_5<<<...>>>(...); // warm up
    }
    for (int i = 0; i < 10; i++){
        kernel_5<<<...>>>(...); // to be measured
    }
    return 0;
}

Each kernel will run 20 times, but only the last 10 times need to be measured. And I need the average time/usage/statistics for the 10 times.
How to do it gracefully using ncu command line? Should I use cudaProfilerStart() / End() to assist?
I want the result to be written into an Excel file. I am a beginner, thank you for help.

There is a skip count that can be used with the kernel name filter to get to the specific 10 instances your interested in. For example set the Kernel name to kernel_4 with a skip count of 10. From the CLI this would be
“–launch-skip 10” You would need to run this separately for each kernel. You can use --csv to get results that you can open in excel.

You will need to do the time/usage/statistics calculations from multiple kernels manually. Currently Nsight Compute is designed around analyzing individual instances of a kernel.