Inconsistent results with nsight systems

anthonyJK1 · June 2, 2023, 7:49am

Hello,

I am trying to profile the sample apps of cuda (for now i am trying on vectorAdd and scalarProd) with Cuda 11.4 and nsys version is the 2022.3.3.18-4d5367b. I am not using any special options with nsys other than profile (of course) and --stats=true. However, I am getting different results with huge differences (for ex: 17000ns, 7840ns, 22000ns …)
What could be the issue?

Thank you.

ztasoulas · June 2, 2023, 5:12pm

The times you mention refer to the kernel execution time? Or total execution time for the application?

Is some other workload running on the GPU?

You could also use the latest Nsight Systems version, 2023.2, available here, to take advantage of more features and bug fixes.

anthonyJK1 · June 2, 2023, 7:11pm

Hello again,

No it was only for the kernel. I will try to update it on Monday.

Thank you for your support

anthonyJK1 · June 5, 2023, 7:54am

Hello again,

I upgraded the nsight systems to 2022.5.2.171-32559007v0 because I am running on a Jetson. I still have the same results as before(different values).

I don’t think anything else is running on the GPU. I did also a reboot for the platform to make sure everything is reset. I also test with ncu command and I am having also the same fluctuations in execution time of the kernel. Last, I also implemented the performance metrics using the event([reference]) and I am obtaining the same fluctuations. (How to Implement Performance Metrics in CUDA C/C++ | NVIDIA Technical Blog)

Best regards,

anthonyJK1 · June 6, 2023, 3:48pm

Hello,

I just made some tests using nsight compute and nsight systems. I launched each profiling software 100 times and I extracted the execution time of the vectorAdd kernel. For more info, I have attached the scripts(for reference) that launches the softwares and I obtained the following results(check picture attached). As you can see there is fluctuations in execution time which I think is a normal behavior of the GPU but the maximum values obtained with nsys is way higher than ncu. Which one is more reliable? In addition I have compared nsys with the cudaeventelapsedtime() And as you can see with nsys I obtained 22 us and with the cudaeventelapsedtime() I obtained 110us.
I hope I made it clearer this time
Thank you for your time.

Best regards.

ncu_scrip.sh (269 Bytes)
nsys_scrip.sh (409 Bytes)

hwilper · June 20, 2023, 2:05pm

One thing to note is that you are not comparing apples to apples here.

Nsight Systems is designed to give you information about the entire system. We perturb the running computer as little as possible.

Nsight Compute is designed to give you deep dive information at the kernel level. They intentionally alter the behavior of the system in order to get the best information about kernel speed of light performance. They may do things like replay a kernel multiple times internally to get averages.

Specifically I think the thing they do that you are hitting here is that they pin the GPU frequency at maximum for the duration of the run.

Topic		Replies	Views
Kernel time of Nsight system is larger than nsight compute Profiling Linux Targets	11	977	April 3, 2024
Inconsistent kernel execution times, and affected by Nsight Systems CUDA Programming and Performance	1	356	April 23, 2024
Difference between nsight-compute and nsys for calculating average value Nsight Compute	2	853	October 12, 2021
Sum of kernel time is different in ncu and nsys Profiling Linux Targets nsight	11	3295	March 15, 2022
Kernel execution measurement - profiling CUDA Programming and Performance	3	249	May 5, 2024
Profiling one application having two concurent kernels Nsight Compute	3	624	June 8, 2023
Nsys doesn't show cuda kernel and memory data Profiling Linux Targets cuda , kernel	10	312	December 7, 2024
NSight Systems does not profile subprocess(via fork in unistd or Process in python.multiprocess) CUDA_API Profiling Linux Targets	6	1325	September 23, 2024
Cycles in nsight-compute and nsight-systems Nsight Compute	2	1238	October 26, 2022
Updated Nsight Systems and lost CUDA API trace Profiling Embedded Targets	11	2224	February 1, 2022

Inconsistent results with nsight systems

Related topics