Can someone help me out in understanding how to interpret the demangled names of the kernels obtained while profiling DNN workloads using NCU and how to map them back to my original PyTorch/ONNX workloads?
Hi, @kunal.sahoo2003
There is an option --print-kernel-base can be used during profile. The value can be mangled/function/demangled
Kernels launched from these frameworks generally have generic names and are specialized via their template arguments or parameters. To see them in context, you can collect NVTX (using --nvtx
) and/or Python call stacks (using --call-stack-type python
). Whether or not NVTX collects range information depends on the app/framework being instrumented with it. If yours is not, you can add ranges around your workloads yourself. Note that multiple kernels will likely map to a high-level part of your application.
sudo ncu --target-processes all --set roofline --call-stack-type python -f -o results/ncu-reps/lstm_infer_bs_1_20epoch_debug bash exp_script.sh
yields the error:
==ERROR== unrecognised option ‘–call-stack-type’. Use --help for further details.
Does NCU with NVTX help me visualize which kernel is associated with which layer in my PyTorch Neural Network?
That is my main objective, to understand which kernel belongs to which high level deep learning layer in a DNN workload.