What is the meaning of error in Nsight UI Diagnostics Summary

tniro · January 26, 2023, 9:38pm

Can someone tell me what this error means?

Error	Analysis		00:00.402	
Event requestor failed: Source ID=
Type=ErrorInformation (18)
 Properties:
  ErrorText (100)=GPU Metrics [0]: NVPA_STATUS_ERROR
- API function: NVPW_Device_PeriodicSampler_DecodeCounters(&decodeParams)
- Error code: 1
- Source function: size_t QuadDDaemon::EventSource::GpuMetricsBackend::Impl::Collect(QuadDDaemon::EventSource::GpuMetricsBackend::CounterDataImage*)
- Source location: /build/agent/work/323cb361ab84164c/QuadD/Target/quadd_d/quadd_d/jni/EventSource/GpuMetricsBackend.cpp:1329

We were trying to trace the object detection application from GitHub - mlcommons/training_results_v2.1 against 2 A100 GPUs:

+ NSYSCMD=' /usr/local/cuda/bin/nsys profile -t cuda,nvtx,osrt,cublas --trace-fork-before-exec true -f true --gpu-metrics-device=all --cuda-memory-usage=true --export=sqlite -o /results/object_detection_pytorch_1x2x12_230126154320094202263.nsys-rep'

We’re just getting started with this stuff and any recommendations would be greatly appreciated. We’ve been able to monitor smaller application runs. I’m sure its some configuration issue… I’ve attached logs for our run.

230126104234615857162_1.log (452.8 KB)

tniro · January 26, 2023, 9:49pm

Also note that we were successful in running the tests without nsys (1 or 2 GPUs). Not 100% of the time though.
T

hwilper · January 27, 2023, 7:29pm

@Andrey_Trachenko who is the right person to work on this?

tniro · February 2, 2023, 2:29pm

Having the same issue with a simpler test. Using GitHub - undertherain/benchmarker: modular framework for [not only] deep learning performance benchmarking
.
Running the following command:

root@4f49d25a37a0:/rockshare/user/tniro/benchmarker#  nsys profile --sample=none  -t cuda,nvtx,cublas -f true -o /rockshare/user/tniro/gpu_benchmarks/proxy/resnet/syseng5/gpu1/gpu1 --gpu-metrics-device=all --cuda-memory-usage=true --export=sqlite python3 -m benchmarker  --mode=inference --framework=pytorch --problem=resnet50 --problem_size=16384 --batch_size=1024 --gpus=1 --nb_epoch=20
GPU 0: General Metrics for NVIDIA GA100 (any frequency)
GPU 1: General Metrics for NVIDIA GA100 (any frequency)
{
    "backend": "native",
    "batch_size": 1024,
    "batch_size_per_device": 1024,
    "channels_first": true,
    "cudnn_benchmark": true,
    "device": "NVIDIA A100 80GB PCIe",
    "framework": "pytorch",
    "framework_full": "PyTorch-1.14.0a0+410ce96",
    "gpus": [
        1
    ],
    "mode": "inference",
    "nb_epoch": 20,
    "nb_gpus": 1,
    "path_ext": "inference",
    "path_out": "./logs",
    "platform": {
        "cpu": {
            "brand": "Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz",
            "cache": {
                "1": 3145728,
                "2": 83886080,
                "3": 50331648
            },
            "clock": 966.5690390625,
            "clock_max": 3200.0,
            "clock_min": 800.0,
            "logical_cores": 128,
            "physical_cores": 64
        },
        "gpus": [
            {
                "brand": "NVIDIA A100 80GB PCIe",
                "clock": 1410000,
                "compute_capability": 8.0,
                "cores": null,
                "memory": 85024112640,
                "memory_clock": 1512000,
                "multiprocessors": 108,
                "warp_size": 32
            }
        ],
        "hdds": {
            "/dev/sda": {
                "model": "PERC H745 Frnt  ",
                "size": 936640512
            },
            "/dev/sdb": {
                "model": "DELLBOSS VD     ",
                "size": 937571968
            }
        },
        "host": "4f49d25a37a0",
        "os": "Linux-5.15.0-58-generic-x86_64-with-glibc2.29",
        "ram": {
            "total": 269870661632
        },
        "swap": 8589930496
    },
    "power": {
        "avg_watt_total": 0,
        "joules_total": 0,
        "sampling_ms": 100
    },
    "preheat": false,
    "problem": {
        "cnt_batches_per_epoch": 16,
        "cnt_samples": 16384,
        "name": "resnet50",
        "precision": "FP32",
        "size": [
            16384,
            3,
            224,
            224
        ]
    },
    "profile_pytorch": false,
    "samples_per_second": 3262.822482960377,
    "start_time": "23.02.02_14.08.59",
    "tensor_layout": "native",
    "time_batch": 0.31383871030302546,
    "time_epoch": 5.021419364848407,
    "time_sample": 0.0003064831155302983,
    "time_total": 100.42838729696814
}
Generating '/tmp/nsys-report-0904.qdstrm'
[1/2] [========================100%] gpu1.nsys-rep
Importer error status: Importation succeeded with non-fatal errors.
**** Analysis failed with:
Status: TargetProfilingFailed
Props {
  Items {
    Type: DeviceId
    Value: "Local (CLI)"
  }
}
Error {
  Type: RuntimeError
  Props {
    Items {
      Type: ErrorText
      Value: "GPU Metrics [0]: NVPA_STATUS_ERROR\n- API function: NVPW_Device_PeriodicSampler_DecodeCounters(&decodeParams)\n- Error code: 1\n- Source function: size_t QuadDDaemon::EventSource::GpuMetricsBackend::Impl::Collect(QuadDDaemon::EventSource::GpuMetricsBackend::CounterDataImage*)\n- Source location: /build/agent/work/323cb361ab84164c/QuadD/Target/quadd_d/quadd_d/jni/EventSource/GpuMetricsBackend.cpp:1357"
    }
  }
}


**** Errors occurred while processing the raw events. ****
**** Please see the Diagnostics Summary page after opening the report file in GUI. ****
[2/2] [========================100%] gpu1.sqlite
Generated:
    /rockshare/user/tniro/gpu_benchmarks/proxy/resnet/syseng5/gpu1/gpu1.qdstrm
    /rockshare/user/tniro/gpu_benchmarks/proxy/resnet/syseng5/gpu1/gpu1.nsys-rep
    /rockshare/user/tniro/gpu_benchmarks/proxy/resnet/syseng5/gpu1/gpu1.sqlite

Using container: nvcr.io/nvidia/pytorch:22.12-py3. The following is the nsys status:

root@4f49d25a37a0:/rockshare/user/tniro/benchmarker# nsys status -e
Timestamp counter supported: Yes

CPU Profiling Environment Check
Root privilege: enabled
Linux Kernel Paranoid Level = 2
Linux Distribution = Ubuntu
Linux Kernel Version = 5.15.0-58-generic: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): OK

Including the generated report:
gpu0.nsys-rep.gz (7.3 MB)

NOTE: we run the same setup but with an AMD server and have no issues.

Topic		Replies	Views
Error Collecting Nsys Profile Metrics Profiling Linux Targets nsight	3	636	April 18, 2024
Error in nsys profiling of python code Profiling Linux Targets	4	438	April 25, 2024
Nsys profile error: invalidArgumentException, unknown API driver activity Profiling Linux Targets nsight	17	3473	July 28, 2023
Error when generating nsys-rep Profiling Linux Targets cuda , kernel , nsight	4	943	May 3, 2023
Nsys Importation error Profiling Linux Targets cuda , kernel , ubuntu , nsight	16	2550	April 29, 2024
Nsys profile error : InvalidArgumentException Profiling Linux Targets nsight	1	727	September 8, 2023
A nsys error Profiling Linux Targets	7	426	June 28, 2024
Nsight-system can't recognize the conda enviroment when profile the application Profiling Linux Targets cuda	4	1155	March 2, 2023
Nsight Systems Issue: Unable to configure the collection of CPU IP samples Profiling Linux Targets	12	8969	December 27, 2021
Nsys is not collecting kernel data Profiling Linux Targets nsight , wsl	31	7522	March 14, 2025

What is the meaning of error in Nsight UI Diagnostics Summary

Related topics