Having the same issue with a simpler test. Using GitHub - undertherain/benchmarker: modular framework for [not only] deep learning performance benchmarking
.
Running the following command:
root@4f49d25a37a0:/rockshare/user/tniro/benchmarker# nsys profile --sample=none -t cuda,nvtx,cublas -f true -o /rockshare/user/tniro/gpu_benchmarks/proxy/resnet/syseng5/gpu1/gpu1 --gpu-metrics-device=all --cuda-memory-usage=true --export=sqlite python3 -m benchmarker --mode=inference --framework=pytorch --problem=resnet50 --problem_size=16384 --batch_size=1024 --gpus=1 --nb_epoch=20
GPU 0: General Metrics for NVIDIA GA100 (any frequency)
GPU 1: General Metrics for NVIDIA GA100 (any frequency)
{
"backend": "native",
"batch_size": 1024,
"batch_size_per_device": 1024,
"channels_first": true,
"cudnn_benchmark": true,
"device": "NVIDIA A100 80GB PCIe",
"framework": "pytorch",
"framework_full": "PyTorch-1.14.0a0+410ce96",
"gpus": [
1
],
"mode": "inference",
"nb_epoch": 20,
"nb_gpus": 1,
"path_ext": "inference",
"path_out": "./logs",
"platform": {
"cpu": {
"brand": "Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz",
"cache": {
"1": 3145728,
"2": 83886080,
"3": 50331648
},
"clock": 966.5690390625,
"clock_max": 3200.0,
"clock_min": 800.0,
"logical_cores": 128,
"physical_cores": 64
},
"gpus": [
{
"brand": "NVIDIA A100 80GB PCIe",
"clock": 1410000,
"compute_capability": 8.0,
"cores": null,
"memory": 85024112640,
"memory_clock": 1512000,
"multiprocessors": 108,
"warp_size": 32
}
],
"hdds": {
"/dev/sda": {
"model": "PERC H745 Frnt ",
"size": 936640512
},
"/dev/sdb": {
"model": "DELLBOSS VD ",
"size": 937571968
}
},
"host": "4f49d25a37a0",
"os": "Linux-5.15.0-58-generic-x86_64-with-glibc2.29",
"ram": {
"total": 269870661632
},
"swap": 8589930496
},
"power": {
"avg_watt_total": 0,
"joules_total": 0,
"sampling_ms": 100
},
"preheat": false,
"problem": {
"cnt_batches_per_epoch": 16,
"cnt_samples": 16384,
"name": "resnet50",
"precision": "FP32",
"size": [
16384,
3,
224,
224
]
},
"profile_pytorch": false,
"samples_per_second": 3262.822482960377,
"start_time": "23.02.02_14.08.59",
"tensor_layout": "native",
"time_batch": 0.31383871030302546,
"time_epoch": 5.021419364848407,
"time_sample": 0.0003064831155302983,
"time_total": 100.42838729696814
}
Generating '/tmp/nsys-report-0904.qdstrm'
[1/2] [========================100%] gpu1.nsys-rep
Importer error status: Importation succeeded with non-fatal errors.
**** Analysis failed with:
Status: TargetProfilingFailed
Props {
Items {
Type: DeviceId
Value: "Local (CLI)"
}
}
Error {
Type: RuntimeError
Props {
Items {
Type: ErrorText
Value: "GPU Metrics [0]: NVPA_STATUS_ERROR\n- API function: NVPW_Device_PeriodicSampler_DecodeCounters(&decodeParams)\n- Error code: 1\n- Source function: size_t QuadDDaemon::EventSource::GpuMetricsBackend::Impl::Collect(QuadDDaemon::EventSource::GpuMetricsBackend::CounterDataImage*)\n- Source location: /build/agent/work/323cb361ab84164c/QuadD/Target/quadd_d/quadd_d/jni/EventSource/GpuMetricsBackend.cpp:1357"
}
}
}
**** Errors occurred while processing the raw events. ****
**** Please see the Diagnostics Summary page after opening the report file in GUI. ****
[2/2] [========================100%] gpu1.sqlite
Generated:
/rockshare/user/tniro/gpu_benchmarks/proxy/resnet/syseng5/gpu1/gpu1.qdstrm
/rockshare/user/tniro/gpu_benchmarks/proxy/resnet/syseng5/gpu1/gpu1.nsys-rep
/rockshare/user/tniro/gpu_benchmarks/proxy/resnet/syseng5/gpu1/gpu1.sqlite
Using container: nvcr.io/nvidia/pytorch:22.12-py3. The following is the nsys status:
root@4f49d25a37a0:/rockshare/user/tniro/benchmarker# nsys status -e
Timestamp counter supported: Yes
CPU Profiling Environment Check
Root privilege: enabled
Linux Kernel Paranoid Level = 2
Linux Distribution = Ubuntu
Linux Kernel Version = 5.15.0-58-generic: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): OK
Including the generated report:
gpu0.nsys-rep.gz (7.3 MB)
NOTE: we run the same setup but with an AMD server and have no issues.