Cupti instrumentation overhead

jhjang1 · October 14, 2020, 11:47am

Hi,
I’m new to using nsight systems

I want to profile the DLRM model inference using nsight systems on ubuntu.
But It seems that the overhead of the profiler is too high on the first iteration of inference because of the CUPTI instrumentation as below
(the unrolled_elementwise_kernel part is the first iteration.)

My command was nsys profile -c cudaProfilerApi -t nvtx,cuda python dlrm_s_pytorch_inference.py <OTHER_ARGS>

Why is this happening?
Can I somehow filter out the first iteration with CUPTI instrumentation overhead?
Or can I reduce the overhead in some way?

Thanks in advance

user74086 · February 15, 2022, 10:40pm

I am running into the exact same issue with the PyTorch profiler which is based off of CUPTI. I am seeing 4 seconds of the CUpti_ActivityOverhead activities before start seeing my runtime and kernel activities.

Topic		Replies	Views
Profiling DLRM ML training using nsight system Profiling Linux Targets	3	558	November 29, 2023
PC Sampling leads to large slow-downs in execution time? CUPTI – CUDA Profiler Tools Interface	1	873	August 16, 2019
Error with CUPTI when profiling CUDA kernel written using Numba Profiling Linux Targets cuda , python , numba	7	708	March 7, 2024
Which main functions are located in the “instrumentation” section? Visual Profiler and nvprof	1	776	October 27, 2021
Nsight compute option to profile only 1 process Nsight Compute cuda	1	379	August 28, 2023
CUPTI Profiler API on large program CUPTI – CUDA Profiler Tools Interface	1	898	April 27, 2021
Option to profile only master process Nsight Compute cuda	23	3498	December 1, 2023
Lost in documentation CUPTI – CUDA Profiler Tools Interface	3	889	January 10, 2023
NVPROF & NV_NSIGHT are much slower than adding CUPTI to the code CUPTI – CUDA Profiler Tools Interface cuda	5	833	October 7, 2020
How much are nsys profiler overheads? Profiling Linux Targets deep-learning-profiler	1	896	January 14, 2022

Cupti instrumentation overhead

Related topics