Error in sampling pytroch profile with nsys and dlprof

Hi, I have a pytorch training workflow which when profiled through nsys (or through dlprof by adding extra line: import nvidia_dlprof_pytorch_nvtx as nvtx and initiating training look within the context torch.autograd.profiler.emit_nvtx()) which gives me the following error at the end of profiling:

Creating final output files...
Processing [===============================================================100%]

**** Analysis failed with:
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  SubError {
    Type: ProcessEventsError
    Props {
      Items {
        Type: ErrorText
        Value: "/build/agent/work/20a3cfcd1c25021d/QuadD/Host/Analysis/EventHandler/PerfEventHandler.cpp(501): Throw in function void QuadDAnalysis::EventHandler::PerfEventHandler::PutCpuEvent(QuadDCommon::CpuId, QuadDAnalysis::EventHandler::PerfEventHandler::EventPtr)\nDynamic exception type: boost::exception_detail::clone_impl<QuadDAnalysis::ChronologicalOrderError>\nstd::exception::what: ChronologicalOrderError\n[QuadDCommon::tag_message*] = Cpu event chronological order was broken.\n"
      }
    }
  }
}

These are the following version of installations:

  1. CUDA: 11.3
  2. nsys: 2021.3.2.12-9700a21
  3. dlprof: v1.8.0 built on 2021-12-01 08:22:18 (Build 29839685)

Even the output sqlite file is being recognised as an invalid DLprof database when profiled through dlprof. getting the same errors on two remote systems one with V100 and another with A100.

2 Likes

Hi,

did you ever resolve the problem?

I’m currently struggling with the same error.

I think i narrowed it down to the DatLoader not running in the main thread, i.e. with >0 workers.

If the data loader iteration happens inside torch.autograd.profiler.emit_nvtx(), it only seems to work when the data loading happens on the main thread (with the workers of the data loader set to 0).

I encountered the same problem and I solved it with the following method:

First execute nsys status --environment to examine the environment, mine result is:

Timestamp counter supported: Yes
Sampling Environment Check
Linux Kernel Paranoid Level = 4: OK
Linux Distribution = Ubuntu
Linux Kernel Version = 5.15.0-84-generic: OK
Linux perf_event_open syscall available: Fail
Sampling trigger event available: Fail
Intel(c) Last Branch Record support: Not Available
Sampling Environment: Fail

The problem is Linux Kernel Paranoid Level = 4 is too high. It must be <=2 as mentioned in Installation Guide :: Nsight Systems Documentation (nvidia.com).

So I executesudo sh -c 'echo 2 >/proc/sys/kernel/perf_event_paranoid' to set it to 2. After that it works fine.