Hi.
I’m studying about nvidia’s unified virtual memory and access counter.
I want to see dram bandwidth or pcie bandwidth that collected by nsight system with gpu-metrics-device option.
My environment is below.
NVIDIA TITAN RTX
Ubuntu 20.04
Nvidia driver 560.28.03
CUDA 12.6
https://github.com/NVIDIA/open-gpu-kernel-modules
...after build...
insmod kernel-open/nvidia-uvm uvm_perf_access_counter_mimc_migration_enable=1
And my application code.
...Some codes...
DATA_TYPE *A_gpu;
DATA_TYPE *x_gpu;
DATA_TYPE *y_gpu;
DATA_TYPE *tmp_gpu;
cudaMallocManaged(&A_gpu, sizeof(DATA_TYPE) * NX * NY);
cudaMallocManaged(&x_gpu, sizeof(DATA_TYPE) * NY);
cudaMallocManaged(&y_gpu, sizeof(DATA_TYPE) * NY);
cudaMallocManaged(&tmp_gpu, sizeof(DATA_TYPE) * NX);
cudaMemAdvise(A_gpu, sizeof(DATA_TYPE) * NX * NY, cudaMemAdviseSetAccessedBy, 0);
cudaMemAdvise(x_gpu, sizeof(DATA_TYPE) * NY, cudaMemAdviseSetAccessedBy, 0);
cudaMemAdvise(y_gpu, sizeof(DATA_TYPE) * NY, cudaMemAdviseSetAccessedBy, 0);
cudaMemAdvise(tmp_gpu, sizeof(DATA_TYPE) * NX, cudaMemAdviseSetAccessedBy, 0);
init_array(x_gpu, A_gpu);
GPU_argv_init();
total_time += ataxGpu(A_gpu, x_gpu, y_gpu, tmp_gpu);
...Rest of codes...
In the above code, cudaMemAdvise makes that gpu’s access counter directly activates and uvm driver handles interrupt from gpu’s access counter.
The final result is below block.
GPU 0: General Metrics for NVIDIA TU10x (any frequency)
WARNING: CPU IP/backtrace sampling not supported, disabling.
Try the 'nsys status --environment' command to learn more.
WARNING: CPU context switch tracing not supported, disabling.
Try the 'nsys status --environment' command to learn more.
Start exp 1
CPU Runtime: 9.093303s
Non-Matching CPU-GPU Outputs Beyond Error Threshold of 0.50 Percent: 0
AVG GPU Runtime: 30.403080s
Generating '/tmp/nsys-report-dcd0.qdstrm'
[1/1] [========================100%] report5.nsys-rep
Importer error status: Importation succeeded with non-fatal errors.
**** Analysis failed with:
Status: TargetProfilingFailed
Props {
Items {
Type: DeviceId
Value: "Local (CLI)"
}
}
Error {
Type: RuntimeError
Props {
Items {
Type: ErrorText
Value: "GPU Metrics [0]: NVPA_STATUS_ERROR\n- API function: Nvpw.GPU_PeriodicSampler_DecodeCounters_V2(¶ms)\n- Error code: 1\n- Source function: virtual QuadDDaemon::EventSource::PwMetrics::PeriodicSampler::DecodeResult QuadDDaemon::EventSource::{anonymous}::GpuPeriodicSampler::DecodeCounters(uint8_t*, size_t) const\n- Source location: /dvs/p4/build/sw/devtools/Agora/Rel/CUDA12.6/QuadD/Target/quadd_d/quadd_d/jni/EventSource/GpuMetrics.cpp:263"
}
}
}
**** Errors occurred while processing the raw events. ****
**** Please see the Diagnostics Summary page after opening the report file in GUI. ****
Generated:
/home/ssbaek/workspace/UVM_benchmark/UVM_benchmarks_oversub/polybench/ATAX/report5.qdstrm
/home/ssbaek/workspace/UVM_benchmark/UVM_benchmarks_oversub/polybench/ATAX/report5.nsys-rep
If i disable gpu’s access counter, error removed.
I think that the access counter option incurs some unexpected error in nsight system.
Can i get some advise? Thanks.