Error in nsys profiling of python code

(kv) [ding3@gpua003 torch]$ nsys profile -o cuda_test python cuda_test.py 
Average of the elements in tensor a: 2.0
Generating '/tmp/nsys-report-c641.qdstrm'
Failed to create '/projects/bcjw/ding3/torch/cuda_test.nsys-rep': File exists.
Use `--force-overwrite true` to overwrite existing files.
[1/1] [========================100%] nsys-report-d601.nsys-rep
Importer error status: Importation succeeded with non-fatal errors.
**** Analysis failed with:
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  SubError {
    Type: ProcessEventsError
    Props {
      Items {
        Type: ErrorText
        Value: "/build/agent/work/323cb361ab84164c/QuadD/Host/Analysis/Modules/TraceProcessEvent.cpp(45): Throw in function const string& {anonymous}::GetCudaCallbackName(bool, uint32_t, const QuadDAnalysis::MoreInjection&)\nDynamic exception type: boost::wrapexcept<QuadDCommon::InvalidArgumentException>\nstd::exception::what: InvalidArgumentException\n[QuadDCommon::tag_message*] = Unknown runtime API function index: 440\n"
      }
    }
  }
}


**** Errors occurred while processing the raw events. ****
**** Please see the Diagnostics Summary page after opening the report file in GUI. ****
Failed to create '/projects/bcjw/ding3/torch/cuda_test.qdstrm': File exists.
Use `--force-overwrite true` to overwrite existing files.
Generated:
    /tmp/nsys-report-c641.qdstrm
    /tmp/nsys-report-d601.nsys-rep

my environment
nvcc 11.8
NVIDIA Nsight Systems version 2022.4.2.1-df9881f
NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2

nsys status --environment
Timestamp counter supported: Yes

CPU Profiling Environment Check
Root privilege: disabled
Linux Kernel Paranoid Level = 2
Linux Distribution = RHEL
Linux Kernel Version = 4.18.0-477.51.1.el8_8.x86_64: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Not Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): Fail

See the product documentation at Nsight Systems — nsight-systems 2024.2 documentation for more information,
including information on how to set the Linux Kernel Paranoid Level.

Move to “Nsight System”

Okay, for a first thing, you already had a file with the output name you assigned, Nsys does not overwrite by default, so you will either need to add the “–force-overwrite true” option, or just do not give a -o and the profiler will call the file report#.nsys-rep and increment the number by default.

I try it again and it shows

nsys profile -o cuda_test --force-overwrite true python cuda_test.py

Average of the elements in tensor a: 2.0
Generating '/tmp/nsys-report-21a9.qdstrm'
[1/1] [========================100%] cuda_test.nsys-rep
Importer error status: Importation succeeded with non-fatal errors.
**** Analysis failed with:
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  SubError {
    Type: ProcessEventsError
    Props {
      Items {
        Type: ErrorText
        Value: "/build/agent/work/323cb361ab84164c/QuadD/Host/Analysis/Modules/TraceProcessEvent.cpp(45): Throw in function const string& {anonymous}::GetCudaCallbackName(bool, uint32_t, const QuadDAnalysis::MoreInjection&)\nDynamic exception type: boost::wrapexcept<QuadDCommon::InvalidArgumentException>\nstd::exception::what: InvalidArgumentException\n[QuadDCommon::tag_message*] = Unknown runtime API function index: 440\n"
      }
    }
  }
}


**** Errors occurred while processing the raw events. ****
**** Please see the Diagnostics Summary page after opening the report file in GUI. ****
Generated:
    /projects/bcjw/ding3/torch/cuda_test.qdstrm
    /projects/bcjw/ding3/torch/cuda_test.nsys-rep

Did you try opening the resulting cuda_test.nsys-rep in the GUI?

Then we can check the diagnostics and see what the minor error was and if it matters to you.