Greetings, thanks for making nsys, it is an exceptionally useful tool!
I’m encountering excessive profiler overhead (“CUDA profiling data flush” in the screenshot):
See in the image the regularly occuring 100% CPU usage in “CUPTI worker thead” and NSys, that are blocking gpu utilization.
I believe this is caused because one of the libraries is repeatedly calling cudaProfilerStart/Stop:
Since I don’t have access to the source code for this library (Gst-nvtracker — DeepStream 6.2 Release documentation) I can’t remove those calls.
The cudaProfilerStart/Stop calls are already being ignored by nsys, I tried with:
nsys profile -t cuda,nvtx -c none app
nsys profile -t cuda,nvtx -c cudaProfilerApi --capture-range-end=none app
But the profiler still initiates the “CUDA profiling data flush” every time cudaProfilerStop is called. Is there a way to override this behavior?
System
Ubuntu 20.04
NVIDIA driver 525.85.12
CUDA 11.8
NSight Systems 2022.4.2, 2023.1.1 (same behavior)
CPU Ryzen 2600 + GPU RTX 2060
Example
The behavior can be reproduced to a lesser extent by running this demo app: deepstream_python_apps/apps/deepstream-nvdsanalytics at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub
python deepstream_nvdsanalytics.py file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_1080p_h264.mp4